Apple found itself somewhat surprised as generative AI technology gained traction. Yet, the Cupertino tech giant appears to be actively engaging with its LLM models, intending to expand the utilization of this technology in future iterations of iOS and Siri.
Reportedly, Apple’s AI researchers have achieved a noteworthy advancement in employing Large Language Models (LLMs) on iPhones and other Apple devices with limited memory. This feat involves the implementation of an innovative flash memory technique.
The research paper titled “LLM in a flash: Efficient Large Language Model Inference with Limited Memory” was published on December 12, 2023. However, it garnered increased attention following an announcement by Hugging Face, a prominent platform for AI researchers to showcase their work, earlier this week. This marks Apple’s second research paper on generative AI within this month and signifies the ongoing efforts enabling the operation of image-generating models, such as Stable Diffusion, on its proprietary chips.
LLMs on iPhones
Prior to this breakthrough, running large language models (LLMs) on devices with constrained memory was deemed unfeasible due to the substantial RAM requirement essential for storing data and supporting memory-intensive operations. In response, Apple researchers devised a solution leveraging flash memory, typically used for storing images, documents, and applications.
According to Apple researchers, “this innovative approach addresses the hurdle of effectively running LLMs that surpass the available DRAM (dynamic random-access memory) capacity. It involves storing the model parameters on flash memory and selectively transferring them to DRAM as needed.“
Hence, the complete LLM remains stored on the device; however, its utilization in RAM is facilitated by engaging with flash memory, operating as a form of virtual memory. This methodology aligns closely with how memory-intensive tasks are managed on macOS.
Put plainly, Apple researchers ingeniously navigated limitations by employing two techniques aimed at reducing data transfer and optimizing flash memory throughput:
Windowing: Think of this as a recycling mechanism for data. Instead of reloading data repeatedly, the AI model reutilizes a segment of previously processed data. This approach reduces the need for constant data retrieval and storage in memory, resulting in a faster and more seamless process.
Row-Column Bundling: This technique involves reading data in larger blocks, akin to comprehending a text by examining larger sections rather than individual words. By effectively grouping data, it enables quicker access from flash memory, enhancing the AI’s language comprehension and generation abilities.
By combining these methods, the paper suggests that AI models could run at least double the size of an iPhone’s memory. This innovation is expected to amplify conventional processor (CPU) speeds by a factor of 5 and drastically accelerate graphics processor (GPU) performance, achieving speeds 20 to 25 times faster.
AI on iPhone
The latest leap in AI efficiency has unlocked a realm of possibilities for future iPhones. These encompass advanced Siri functionalities, real-time language translation, and an array of AI-powered enhancements in photography and augmented reality. Additionally, this technological stride is poised to pave the way for iPhones to support sophisticated on-device AI chatbots and assistants, aligning with Apple’s rumored development efforts in this domain.
0 Comments