Why

I am a Star Trek fan stuck on Earth due to our civilisation's warp capabilities.

As I yearn for interstellar travel, I often find myself dreaming of the advanced AI systems aboard starships like the Enterprise. Such as the ship's computer, that effortlessly understand and respond to crew members' instructions without hallucinating or misinterpreting commands.

While we're still far from achieving warp capabilities on Earth, our current AI technology - particularly Large Language Models (LLMs) - offers a glimpse into the potential future of human-computer interaction.

However, much like our journey towards the stars, the path to perfecting LLMs is filled with challenges. Though LLMs are a disruptive technology that bridges communication between humans and computer systems, they are still in their early stages.

We need to solve the following problems to harness their true potential and perhaps take a step closer to the reliable AI systems:

  1. LLMs hallucinate, producing false or inaccurate information. 
  2. LLMs rely heavily on learned patterns from their massive pre-training datasets, which can limit their ability to generalize to entirely new contexts. While they can exhibit some reasoning and abstraction, enhancing this capability remains challenging for achieving true reasoning.
  3. LLMs are expensive to train, fine-tune, and run, requiring substantial computational resources and energy consumption. This high cost can be a barrier to accessibility and scalability, limiting their use to organizations with significant resources. 
  4. LLMs are not easily explainable; their decision-making processes are black boxes, making it difficult to understand how they arrive at specific outputs. This lack of transparency raises concerns about trust, accountability, and the ability to diagnose or correct errors.
  5. Current user interfaces are not optimized for AI interaction, necessitating a fundamental redesign of UI/UX principles to fully leverage AI capabilities and accommodate its unique characteristics, such as intelligence, context-awareness and natural language processing.

To address the fundamental issues with Large Language Models (LLMs), we need to examine their foundational components and rethink our approach from first principles. However, several significant barriers make it challenging for researchers and engineers to solve these problems:

  1. The fundamental building blocks and algorithms underlying LLMs are highly complex and require extensive expertise to fully understand and modify. This steep learning curve can deter many potential contributors from engaging in foundational research.
  2. Building an LLM from scratch with foundational modifications is extremely resource-intensive and expensive. This high barrier to entry limits the number of organizations and individuals who can participate in groundbreaking LLM research and development.
  3. While there is ongoing research producing novel and promising ideas, integrating these innovations into existing LLM architectures is challenging. Many improvements require alterations to the foundational architecture, which can be difficult to implement without rebuilding the entire system.
  4. Solving some LLM problems efficiently may require expertise in system-level programming or even hardware-level optimizations.
  5. LLM research is often unorganized and scattered across various institutions, companies, and publications. This fragmentation can be intimidating for newcomers and makes it difficult to get a comprehensive understanding of the field or to identify the most promising areas for contribution.
  6. Redesigning UI/UX for AI interaction requires a paradigm shift in thinking about human-computer interfaces. Existing tools and methodologies may be inadequate for prototyping and testing AI-driven interfaces, further complicating the development process.

To tackle the fundamental challenges in Large Language Model (LLM) development and democratize access to this technology, I want to work on solutions that leverages community resources, innovative techniques, and user-friendly tools. The strategy encompasses the following key initiatives:

  1. Simplifying LLM Foundations
    • Develop accessible resources that break down the foundational blocks of LLMs.
    • Create step-by-step guides on building LLMs from scratch, making the process more approachable for newcomers.
  2. Exploring Efficient Training Methodologies
    • Investigate techniques for training LLMs with significantly less data, reducing resource requirements.
    • Explore and extend innovative approaches such as neuro-symbolic AI, which combines neural networks with symbolic reasoning.
  3. Developing User-Friendly Tools for LLM Experimentation
    • Build a robust library or no-code tool using Rust, providing a high-performance foundation.
    • Implement friendly APIs and interfaces that allow easy experimentation with LLM architectures at both foundational and hardware levels.
    • Leverage technologies like AWS FPGA to enable hardware-level optimizations without requiring deep expertise in hardware design.
    • Develop AI-native interface design frameworks, tools, to create intuitive, intelligent, context-aware interfaces that fully leverage LLM capabilities.
  4. Enabling Distributed LLM Training
    • Design and implement a framework for distributed LLM training that utilizes community-contributed computational resources.
    • Develop protocols and tools for secure, efficient sharing of processing power across a network of volunteers.

My vision is to evolve LLMs beyond their hallucination-prone state, enabling reliable performance of critical tasks through neuro-symbolic approaches. I aim to create intelligent AI tools that are free and open for all to use and improve.

Beam up to my community if you want to be a part of solving it.