The emergence of OpenAI’s ChatGPT has put chatbots at the forefront of Artificial Intelligence (AI) interactions today. It seems that conversing with an AI chatbot is the primary means of interacting with AI models and intelligent systems. While I acknowledge that a chatbot provides a structured, user-friendly interface for most users to engage with an AI model, it cannot be said that all possibilities of interacting with an intelligent system are confined within the confines of a text chatbox.
Microsoft has enthusiastically embraced the trend of incorporating AI chatbots into its products, exemplified by the integration of Windows Copilot, an AI chatbot powered by OpenAI’s models, into Windows 11. Notably, Microsoft has replaced Cortana with Windows Copilot in its latest operating system. Additionally, the tech giant has extended the integration of Windows Copilot to Windows 10, supplanting Cortana in the process.
Microsoft appears to view AI chatbots as the future, but questions arise regarding whether this aligns with the vision of intelligent computing driven by AI. Some speculate that Microsoft may be leveraging AI chatbots to capitalize on the AI hype and demonstrate to investors its commitment to AI. However, the current state of AI-driven chatbots has limited practical application, particularly in providing meaningful assistance at the operating system level.
Is Windows Copilot a Step Backward from Cortana?
Microsoft made the decision to phase out Cortana, a product with a nine-year history, in favor of Windows Copilot. However, is Windows Copilot a suitable replacement, especially considering it is still in the preview stage?
Let’s delve into a point-by-point comparison .In its inception, Cortana primarily functioned as a voice assistant, while Windows Copilot operates as a text-based AI chatbot, although it does offer support for voice input, which is not activated by default.
In simple terms, Windows Copilot lacks a voice-first user experience, providing a disjointed interaction compared to the more personalized feel of Cortana. Many users prefer voice input for its ease of use and intuitive nature, making Windows Copilot’s reliance on text input a significant drawback in terms of UI approachability.
Regarding features, Cortana had evolved into a comprehensive product with extensive system-level capabilities. It could execute tasks such as setting timers, alarms, reminders, composing emails, defining terms, launching applications, and more. Essentially, Cortana was deeply integrated into the Windows OS, possessing a thorough understanding of system functions and commands.
In contrast, Copilot relies on general-purpose large language models (LLM) that are not specifically optimized for executing local actions on Windows. For example, when asked to set a timer, Windows Copilot directs users to an online service rather than performing the task locally. Similarly, it cannot set an alarm or play music; instead, it merely opens the Spotify app. These limitations suggest that Copilot may not offer the level of AI magic users expect.
Microsoft is hastily jumping onto the AI hype train, reminiscent of its regret over missing the smartphone race. Eager not to repeat the same mistake, the company is aggressively pursuing AI integration.
Certainly, Windows Copilot is still in its preview stage, and it’s likely that additional features will be incorporated in the future, some of which are already being tested in Insider builds. However, the question remains: why the rush to replace Cortana with a chatbot that’s still in its infancy?
It appears that Microsoft is eager to jump on the AI train, reminiscent of its past regrets in missing out on the smartphone race, and is determined not to make the same mistake again.
What’s concerning is the apparent lack of careful consideration given to Windows Copilot. It seems that Microsoft has simply integrated a chatbot without much refinement, at least for the time being. There’s a notable absence of feature parity between Copilot and Cortana before phasing out the latter, which is disappointing given Cortana’s nearly decade-long presence.
The introduction of a Copilot key on the Windows keyboard, touted by Microsoft as a “significant change to the Windows PC keyboard in nearly three decades,” feels like a missed opportunity for meaningful integration.
Where’s the AI Magic in Windows Copilot?
Let’s explore the capabilities of Windows Copilot. Users can inquire about various topics and receive instant answers. Additionally, by switching to Creative mode, they can engage with the robust GPT-4 model.
Copilot can perform a variety of tasks such as summarizing webpages, extracting key insights, and planning itineraries. Microsoft has also integrated a screenshot tool into Copilot, which utilizes the GPT-4V model for visual analysis. This tool can be used for optical character recognition (OCR) or to gather information about an image.
In terms of Windows-specific functionalities, you can prompt Copilot with statements like “I’m experiencing audio issues,” and it will initiate the audio troubleshooter. This troubleshooting capability extends to other Windows-related issues as well. Additionally, Copilot enables users to toggle dark mode, capture screenshots, and manage window snapping.
While these features are impressive for the preview version of Windows Copilot, most of them also function in Edge Copilot, with the exception of Windows-specific features. However, Windows Copilot is unable to access webpages from Chrome or other browsers. Since Windows Copilot runs on Edge’s engine, it cannot access content from other windows, including browsers, Notepad, or Office apps.
Another significant limitation in Windows Copilot’s implementation is its lack of development using the WinUI 3 framework to deliver a native experience. Instead, Copilot runs as an extension of the Edge browser. Consequently, there is a lack of deep integration of Windows Copilot in key elements of the operating system.
For instance, users cannot right-click on a file in Windows Explorer and request Windows Copilot to explain it, convert its format, or perform any desired action. It would have been a remarkable feature if users could, for example, throw an Excel file at Copilot from the context menu and have it conduct data analysis directly. However, currently, aside from images, there is no means to interact with files using Windows Copilot on Windows 11.
Windows Copilot: A Tale of Overpromising and Under-delivering
In recent times, Microsoft has excelled in announcing and marketing new features, yet there’s been a disconnect when it comes to utilizing the promised functionalities. Three months ago, when Windows Copilot was unveiled, it pledged several new features. However, either these features are still unavailable or they do not operate as advertised.
For instance, when requesting Windows Copilot to snap windows, it prompts for permission but only snaps one window, leaving the user to complete the rest of the action. Similarly, it fails to play mood-specific music as requested during work sessions. Instead of providing tailored music, Copilot simply offers links from YouTube and other sources. This falls short of the expectations for an intelligent AI-powered Copilot.
Furthermore, the highly anticipated contextual menu for Copilot has yet to materialize. Functions like Rewrite, Explain, and Summarize remain unavailable for any active window. Similarly, the promised Draft with Copilot feature is conspicuously absent even three months after release. Additionally, capabilities such as background removal from images and support for extensions have not been implemented.
Consequently, despite the marketing hype surrounding these features, they are notably absent in reality. It appears to be a classic case of Microsoft overpromising and under-delivering with many of its products.
What Might be the Vision for Windows Copilot?
Let’s delve into the capabilities of Windows Copilot. While examining what the open-source community has developed, we find an intriguing tool called Open Interpreter. This tool can engage with local files, facilitating tasks such as file format conversion, processing various formats, generating charts, and much more. Moreover, it can interact with diverse system settings and tools, enabling actions to be performed on Windows.
Recently, a new version of Open Interpreter (0.2.0) was launched, featuring an intriguing OS mode. This mode allows users to operate their computers using simple natural language prompts. Open Interpreter leverages vision models like GPT-4V to comprehend the GUI environment and execute actions on the computer.
For instance, users can instruct it to activate dark mode, prompting Open Interpreter to open the relevant Settings page and toggle the setting using the Vision model.
You could simply ask it to play some lo-fi music, and it would open your browser, navigate to YouTube, and find some fantastic lo-fi playlists to play for you. These examples only scratch the surface of what vision models can achieve. However, Windows Copilot appears limited to offering text-based responses within the chatbox.
A truly intelligent Copilot should possess the capability to send emails, adjust Windows settings, interact with the OS at the system level, and offer a myriad of other functionalities. The potential use cases are limitless, presenting an opportunity to significantly enhance accessibility on Windows 11 24H2.
While utilizing the GPT-4V API might incur considerable costs for Microsoft, there’s the option to develop a smaller vision model specifically tailored for Windows, akin to CogVLM. This approach would minimize latency and enable operations to run locally, even when the PC is offline.
With forthcoming advancements such as Intel and Snapdragon X Elite chipsets equipped with dedicated NPUs, running smaller models on-device becomes feasible. Alternatively, if Microsoft opts to run its in-house developed visual model on the cloud, it would likely incur significantly lower costs.
Another notable example is the recent unveiling of Rabbit R1, an AI-first hardware device designed to streamline tasks. Powered by a Large Action Model (LAM), Rabbit R1 boasts the capability to execute a range of actions, all initiated with voice input. Whether it’s ordering pizza, sending emails, or booking flights, this device intelligently handles various tasks with ease.
Microsoft should devise a concept akin to an LAM, engineered specifically for executing tasks rather than solely engaging in chatbot interactions.
If a small startup like Rabbit can achieve this, then surely a tech giant like Microsoft, with its vast resources, can as well. Thus far, Microsoft has developed its own Phi-2 model, a small LLM, primarily for research purposes. However, if Microsoft truly aims to deliver AI-powered PCs in 2024, it must develop Windows-specific vision models capable of running agents locally with minimal latency. Microsoft should create something akin to a Local Action Model (LAM) designed to perform actions rather than solely engaging in chatbot conversations.
A Fresh Approach Required for Windows Copilot
In conclusion, the current iteration of Windows Copilot as a chatbot offers a severely restricted utility, with functionalities already duplicated by numerous browser extensions and Edge Copilot. Microsoft must adopt a novel strategy to actualize the vision of AI-powered PCs.
Unlike its rival Apple, renowned for meticulously crafting products before public release, Microsoft tends to take a different approach. It often rushes to market with products lacking functional and substantive features at launch. This hasty deployment undermines user experience and impedes the realization of Microsoft’s objectives.
This exemplifies Microsoft’s approach to AI, which appears somewhat haphazard. The company has begun branding Edge as an “AI browser” simply by integrating a chatbot. Moreover, efforts are underway to infuse AI capabilities into Notepad and various other first-party apps such as MS Paint, Snipping Tool, and Office applications.
Microsoft must move beyond the fixation on integrating chatbots and embark on a new direction.
While these in-app AI features may benefit certain users, for Windows to truly become an intelligent operating system driven by AI, Microsoft must move past the fixation on integrating chatbots and instead embrace fresh ideas and innovative approaches.
0 Comments