Category: AI

  • RAG Enabled WordPress in Core Could Transform WordPress from CMS to AIMS

    Sometimes you see an idea in an area you’ve been thinking about deeply, and it doesn’t just click. It’s a puzzle piece that absolutely thunders into place. You take a step back, the puzzle is complete, and what was an abstract image has crystalized into a clear vision of an inevitable future. That just happened with a tweet by James LePage from 5 hours ago.

    I’ve been spending a huge amount of time reading AI, as have many of us. I’ve also been working hard with a product team, which gives me the opportunity to turn what I’m learning into applied knowledge. The benefit of this is that you’re taking a hard look at the tech landscape and identifying real problems and matching them to applied solutions using the latest (meaning this week’s) ideas and technology.

    I suspect that James mis-tweeted when saying WordPress IS a vector database, but I think we all know what he means – and the concept here is incredibly exciting. WordPress is:

    • Called a content management system, but this can also be described as a document management system.
    • Is one of perhaps the two most popular document management systems on Earth – with Google Docs being the other.
    • Has more documents under management than perhaps any other platform, already uploaded, categorized and tagged along with metadata for each doc.
    • Has an app store-like dev ecosystem with the plugin repository.
    • Has several hundred million installs with billions of human visitors to those installs (websites).
    • Is an online application that runs 24 hours a day and provides an interface for humans and machines to interact with.
    • Has a mature security ecosystem (shout out to my company Wordfence) with many vendors and solutions.
    • Has solved high performance document storage and retrieval at scale on a live site with live editing.

    Retrieval Augmented Generation, or RAG, is the process of turning documents into embeddings (an array of numbers) which represent the meaning of each doc or chunk of text, storing those numbers with an index referencing each doc or chunk in a database, and then retrieving the documents or chunks to augment prompts sent to AI. It works like this:

    A company has specific knowledge in a big document database. They vectorize the whole lot (generate embeddings for the docs or chunked docs) and store it all and create a RAG application. You come along and interact with, lets say a chat bot. Here’s what happens:

    • Your question is turned into an embedding (string of numbers) representing its meaning.
    • That embedding is used to retrieve chunks of text in the vector database that are related to your question.
    • The docs related to your question are used to create an AI prompt that reads something like “Here is a user’s question and a bunch of documents related to the question. Use this knowledge along with your own to answer the user’s question as best you can”

    That’s it. That’s RAG. But what’s super powerful is two things:

    1. That we can represent semantic meaning in a way that lets us retrieve on similarity in meaning. That’s breakthrough number one.
    2. That we can retrieve documents similar in meaning to a question and augment the knowledge of an AI as part of our question to that AI model.

    Put differently: Individuals and organizations can immediately put their SPECIFIC KNOWLEDGE to work and that can be a differentiator for them in this world of AI models. It’s not just your little business providing an interface into Sam Altman’s latest model. It’s your little business with its specific knowledge providing a differentiated AI powered application, because your AI knows things that others don’t and perhaps never will. That’s why RAG rocks.

    Here’s the thing about WordPress: Every WordPress website is a collection of specific knowledge that in many cases is extremely high quality and has taken years to accumulate. If you could put that specific knowledge to work in an AI context, WOW! You don’t even need to collect and organize the docs. You already have done the work to collect and index the knowledge. You just need to use RAG to feed it to an AI for each prompt, whether you’re generating new content, answering user questions, whatever.

    So what James tweeted was a user interface – that’s probably in working condition – of RAG for WP and I think he’s doing it in the context of this being added to WP core, since he works for Automattic. But the message here is really: Hey, consider enabling the largest document management system in the world (one of) as an enabler for RAG apps using the existing dev ecosystem, massive deployed base, massive collection of documents, and overnight turn this into the largest RAG AI dev ecosystem in the world.

    What I mean specifically is this:

    • Document vectorization would be as easy as it appears in James’ screenshots.
    • RAG retrieval would be available in the core API.
    • WordPress plugins would immediately be able to build applications around fetching chunks of text from the existing knowledge a site has and sending RAG augmented prompts to any AI interface, whether it’s self hosted, open source, closed source, REST endpoint or running local.
    • In other words, WP developers would immediately be able to put the specific knowledge that hundreds of millions of websites have spent years collecting to immediate work for the benefit of the site owner.

    The value of a WordPress website immediately increases by an order of magnitude.

    There are challenges that need to be solved. Specifically:

    • What model is generating the embeddings and where is it run? Local? API endpoint? Does it have vendor lockin or is it open source? Does the host have vendor lockin or is it open source? Ideally it would be CPU and usable directly from PHP so no new ops dependencies are introduced.
    • Is the orchestration of vectorization and retrieval in PHP? Or is it Python, which may not be available on a WP site? Ideally all PHP so that Python is not a dependency for existing sites.
    • How is retrieval being done? Pgvector, which adds postresql as a dependency? Or some kind of MySQL/MariaDB magic I’m not aware of? (Since MySQL/MariaDB doesn’t support vector retrieval/indexing). Ideally you wouldn’t add a new DB engine as a dependency.

    If you can eliminate the dependencies, you can deploy a new version of WP core overnight and enable this on every site, world-wide for immediate use.

    There’s a massive opportunity here if hosting providers collaborate with WP core for the hosts to provide local GPU resources to generate embeddings along with pgvector for retrieval. It’s a whole new source of revenue for them. At the last WCUS I literally went around and asked every host if they’re providing GPU hosting and only one said they are. It’s wide open for the taking and it’s worth billions, since GPU is far more expensive that CPU  to rent.

    It’s quite possible that James’ tweet will be a harbinger of the transformation of WordPress from CMS to AIMS (AI management system).

  • The Nanny Scale

    Lets say you’re writing a Slackbot that plugs into an LLM. When you send the LLM the prompt, instead of only sending the user message to the LLM, you could send all the Slack data associated with that message including the message itself. Then you could give the LLM tool calling access to the Slack API to perform a range of lookups using data from the Slack request, including for example the Slack user ID of the message sender. To continue this example, the LLM might use the API to look up the username and full name of the message sender, so it can have a more natural conversation with that person.

    For the moment, forget about the specifics of implementing a Slack user interface. The question here is, do we give the model all the data and let it call tools to do things with that data when it determines that would be helpful in building a response? Or do we nanny the model a bit more (provide more care and supervision) and only give it a prompt that we’ve crafted, without ALL the available metadata?

    I’m going to call this The Nanny Scale and suggest that as models continue to get smarter we’ll move more towards increasing model responsibility. It also varies based on how smart the model is you’re using. If it’s an o1 Pro level model with CoT and tool calling capability, maybe you want to give it all the metadata and as many tools as you can related to that metadata, and just let it iterate with the tools and the data until it decides it’s done and has a response for you.

    If you’re using a small model and then further quantizing it to fit into available memory, thereby risking reducing it’s IQ even further,  you probably want to nanny the model, meaning that you increase the care and supervision and reduce the responsibility the model has, and you pre-parse data, removing anything that can cause confusion but may be potentially useful, and reduce the available tools, if you provide any at all.

    It’s clear that a kind of Moore’s Law is emerging with regards to model IQ and tool calling capabilities. Eventually we’re going to have very smart models that are very cheap and that can handle having an entire API thrown at them in terms of available tools. But we’re not there yet. Models are expensive, so we like to use cheaper less capable models when we can, and even the top performers aren’t quite ready for 100% responsibility.

    So as we’re building applications we’re going to have to keep this in mind. We’ll launch v1, models will evolve over several months, and for v2 we’re probably going to have to slide the nanny scale down a notch or two or risk shielding our customer from useful cognitive capabilities that are revealed when a model takes on more responsibility.

  • Amidst the Noise and Haste, Google Has Successfully Pulled a SpaceX

    In 2013 Google started work on TPUs and deployed them internally in 2015. Sundar first publicly announced their existence in 2016 at I/O, letting the world know that they’d developed custom ASICs for TensorFlow. They made TPUs accessible to outside devs via Google Cloud in 2017 and also released the second generation that same year. And since we’re plotting a timeline here, the Attention is All You Need paper that launched the LLM revolution was published in June of that same year.

    OpenAI got a lot of attention with GPT4, a product based on the AIAYN paper, putting LLMs on the map globally, and Google has taken heat for not being the first mover. OpenAI last raised $6.6B at a $157B valuation late last year, which incidentally is the largest VC rounder ever, and they did this on the strength of GPT4 and a straight line trajectory that GPT5 will be ASI and/or AGI, or close enough that the hair splitters won’t matter.

    But as OpenAI is lining up Oliver Twist style asking NVidia if “please sir, may I have some more” GPU for my data center, Google has vertically integrated the entire stack from chips with their TPUs, to interlink, to the library (TensorFlow) to the applications that they’re so good at serving to a global audience at massive scale with super low latency, using water cooled data centers that they pioneered back in 2018 and which NVidia is getting started with.

    Google has been playing a long game since 2013 and earlier, and doesn’t have to create short term attention to raise a mere $6 billion because they have $24 billion in cash on their balance sheet, and that cash pile is growing.

    What Google has done by vertically integrating the hardware is strategically similar to SpaceX’s Starlink, with vertically integrated launch capability. It’s impossible for any other space based ISP to compete with Starlink because they will always be able to deploy their infrastructure cheaper. Want to launch a satellite based ISP? SpaceX launched the majority of the global space payload last year, so guess who you’re going to be paying? Your competition.

    NVidia’s margin on the H100 is 1000%. That means they’re selling it for 10X what it costs to produce. Google are producing their own TPUs at scale and have been for 10 years. Google’s TPUs produce slightly better performance than NVidia’s H100 and is probably on par when it comes to dollar per compute. Which means Google is paying 10X less for GPU compute than their competitors.

    And this doesn’t take into account the engineering advantages derived from having the entire stack from application to chips to interconnect all in-house, and how they can tailor the hardware to their exact application and operational needs. When comparing NVidia to AMD, the former is often described as having a much closer relationship with developers and releasing fixes to Cuda on very short timelines for their large customers. Google is the same company.

    As a final note, I don’t think it’s unreasonable to consider the kind of pure research that drives AI innovation as part of the supply chain. And so one might argue that Google has vertically integrated that too.

    So amidst the noise and haste of startups and their launches, remember what progress their may be in silence.

  • My 2025 AI Predictions

    The $60 million deal that Google cut with Reddit will emerge as incredibly cheap as foundational model providers realize amidst the data crunch that Reddit is one of the few sources of constantly renewed expert knowledge, with motivated experts in a wide range of fields contributing new knowledge on a daily basis for nothing more than social recognition. The deal is non-exclusive as was demonstrated by a subsequent deal with OpenAI, meaning Reddit will begin to print money.

    Google’s vertical integration of hardware via their TPUs, their software applications, and their scientists inventing the algorithms that underpin the AI revolution is going to begin to pay off. Google will launch a number of compelling AI applications and APIs in 2025 that will take them from an academic institution creating algorithms for others, to a powerhouse in the commercial AI sector. Their cost advantage will enable them to deliver those applications at a far lower price to their customers, and in many cases, completely free. Shops like OpenAI lining up for NVidia GPUs will be the equivalent of a satellite ISP trying to compete with Starlink who have vertically integrated launch capability.

    DeepSeek will continue to demonstrate unbelievable cost reductions after delivering V3 for less than $6 million as the group of former hedge fund guys continues to sit in a room and simply outthink OpenAI, which has been hemorrhaging talent and making funding demands approaching absurdity.

    OpenAI will be be labeled the Netscape of the AI revolution and be absorbed into Microsoft at the end of the year. But like Netscape, many of their ideas will endure and will shape future standards.

    As companies like Google and High-Flyer/DeepSeek prove how cheap is to train and operationalize models, there will be a funding reset and companies like Anthropic who raised a $4 billion series F round from Amazon in November will need to radically reduce costs and we may see down rounds.

    We will see new companies emerge that provide tools to implement o1 style chain of thought in a provider and model agnostic way. Why pay o1 token prices for every step in CoT when some of the steps can be done by cheaper (or free) models from other providers?

    China will continue to rival the USA in AI research and in shipped models. The new administration will rethink the current limits on GPU exports which will prove ineffective at accomplishing their goals of slowing the competition.

    And finally my personal hope is that the conversation around the dangers of AI will shift from a fantastic Skynet scenario to the practical reality that out of the $100 trillion global GDP, $50 trillion is wages, and that is both the size of the AI opportunity and the scale of the global disruption that AI will create as it goes after human labor and human wages.

    We need to acknowledge this reality and hold to account disingenuous companies and founders who are distracting from this through AGI and ASI scare mongering. This “look at the birdie while we steal your jobs” game needs to end. The only solution I’ve managed to think of is putting open source tools and open source models in the hands of the workers of the world to give them the opportunity to participate in what could, long term, become a utopian society.