ChatGPT, The Worst Summer Intern Ever

Back when I used to work in the pharma industry, I had the opportunity to hire summer interns. This was a long time ago, long enough that the fresh-faced college students who applied for the gig are probably now creeping up to retirement age. The idea, as I understood it, was to get someone to help me with my project, which at the time was standing up a distributed data capture system with a large number of nodes all running custom software that I wrote, reporting back to a central server running more of my code. It was more work than I could manage on my own, so management thought they’d take mercy on me and get me some help.

The experience didn’t turn out quite like I expected. The interns were both great kids, very smart, and I learned a lot from them. But two months is a very tight timeframe, and getting them up to speed took up most of that time. Add in the fact that they were expected to do a presentation on their specific project at the end of the summer, and the whole thing ended up being a lot more work for me than if I had just done the whole project myself.

I thought about my brief experience with interns recently with a project I needed a little help on. It’s nothing that hiring anyone would make sense to do, but still, having someone to outsource specific jobs to would be a blessing, especially now that it’s summer and there’s so much else to do. But this is the future, and the expertise and the combined wisdom of the Internet are but a few keystrokes away, right? Well, maybe, but as you’ll see, even the power of large language models has its limit, and trying to loop ChatGPT in as a low-effort summer intern leaves a lot to be desired.

Continue reading “ChatGPT, The Worst Summer Intern Ever”

Text-to-Speech Model Can Do Music, Background Noises, And Sound Effects

Bark is a universal text-to-audio model that can not only create realistic speech, it can incorporate music, background noises, and sound effects. It can even include non-speech sounds like laughter, sighs, throat clearings, and similar elements. But despite the fact that it can deliver such complex results, it’s important to understand some of the peculiarities.

The model takes a prompt and generates the resulting sound from scratch. Results might sometimes be unexpected.

Bark is not a conventional text-to-speech program, and how it works has a lot more in common with large language model AI chatbots. This means that results can deviate from expectations, and outputs aren’t necessarily going to be studio-quality speech. As the project’s README points out, “(generated outputs can) be anything from perfect speech to multiple people arguing at a baseball game recorded with bad microphones.” That being said, there is some support for voice presets as a way to help guide the model with some consistency.

Bark was designed by a company called Suno for research purposes and is available under the MIT License. It can be installed and run locally, and has some demos available as well as an online implementation.

The ability to install and run Bark locally is promising territory for incorporating it into projects. And should you be more interested in speech-to-text instead, don’t forget about this plain C/C++ implementaion of AI-powered speech recognition.

Bridging A Gap Between LLMs And Programming With TypeChat

By now, large language models (LLMs) like OpenAI’s ChatGPT are old news. While not perfect, they can assist with all kinds of tasks like creating efficient Excel spreadsheets, writing cover letters, asking for music references, and putting together functional computer programs in a variety of languages. One thing these LLMs don’t do yet though is integrate well with existing app interfaces. However, that’s where the TypeChat library comes in, bridging the gap between LLMs and programming.

TypeChat is an experimental MIT-licensed library from Microsoft which sits in between a user and a LLM and formats responses from the AI that are type-safe so that they can easily be plugged back in to the original interface. It does this by generating JSON responses based on user input, making it easier to take the user input directly, run it through the LLM, and then use the output directly in another piece of code. It can be used for things like prototyping prompts, validating responses, and handling errors. It’s also not limited to a single LLM and can be fairly easily modified to work with many of the existing models.

The software is still in its infancy but does hope to make it somewhat easier to work between user inputs within existing pieces of software and LLMs which have quickly become all the rage in the computer science world. We expect to see plenty more tools like this become available as more people take up using these new tools, which have plenty of applications beyond just writing code.

ChatGPT V. The Legal System: Why Trusting ChatGPT Gets You Sanctioned

Recently, an amusing anecdote made the news headlines pertaining to the use of ChatGPT by a lawyer. This all started when a Mr. Mata sued the airline where years prior he claims a metal serving cart struck his knee. When the airline filed a motion to dismiss the case on the basis of the statute of limitations, the plaintiff’s lawyer filed a submission in which he argued that the statute of limitations did not apply here due to circumstances established in prior cases, which he cited in the submission.

Unfortunately for the plaintiff’s lawyer, the defendant’s counsel pointed out that none of these cases could be found, leading to the judge requesting the plaintiff’s counsel to submit copies of these purported cases. Although  the plaintiff’s counsel complied with this request, the response from the judge (full court order PDF) was a curt and rather irate response, pointing out that none of the cited cases were real, and that the purported case texts were bogus.

The defense that the plaintiff’s counsel appears to lean on is that ChatGPT ‘assisted’ in researching these submissions, and had assured the lawyer – Mr. Schwartz – that all of these cases were real. The lawyers trusted ChatGPT enough to allow it to write an affidavit that they submitted to the court. With Mr. Schwartz likely to be sanctioned for this performance, it should also be noted that this is hardly the first time that ChatGPT and kin have been involved in such mishaps.

Continue reading “ChatGPT V. The Legal System: Why Trusting ChatGPT Gets You Sanctioned”

Leaked Internal Google Document Claims Open Source AI Will Outcompete Google And OpenAI

In the world of large language models (LLM), the focus has for the longest time been on proprietary technologies from companies such as OpenAI (GPT-3 & 4, ChatGPT, etc.) as well as increasingly everyone from Google to Meta and Microsoft. What’s remained underexposed in this whole discussion about which LLM will do more things better are the efforts by hobbyists, unaffiliated researchers and everyone else you may find in Open Source LLM projects. According to a leaked document from a researcher at Google (anonymous, but apparently verified), Google is very worried that Open Source LLMs will wipe the floor with both Google’s and OpenAI’s efforts.

According to the document, after the open source community got their hands on the leaked LLaMA foundation model, motivated and highly knowledgeable individuals set to work to take a fairly basic model to new levels where it could begin to compete with the offerings by OpenAI and Google. Major innovations are the scaling issues, allowing these LLMs to work on far less powerful systems (like a laptop or even smartphone).

An important factor here is Low-Rank adaptation (LoRa), which massively cuts down the effort and resources required to train a model. Ultimately, as this document phrases it, Google and in extension OpenAI do not have a ‘secret sauce’ that makes their approaches better than anything the wider community can come up with. Noted is also that essentially Meta has won out here by having their LLM leak, as it has meant that the OSS community has been improving on the Meta foundations, allowing Meta to benefit from those improvements in their products.

The dire prediction is thus that in the end the proprietary LLMs by Google, OpenAI and others will cease to be relevant, as the open source community will have steamrolled them into fine, digital dust. Whether this will indeed work out this way remains to be seen, but things are not looking up for proprietary LLMs.

(Thanks to [Mike Szczys] for the tip)

ChatGPT Makes A 3D Model: The Secret Ingredient? Much Patience

ChatGPT is an AI large language model (LLM) which specializes in conversation. While using it, [Gil Meiri] discovered that one way to create models in FreeCAD is with Python scripting, and ChatGPT could be encouraged to create a 3D model of a plane in FreeCAD by expressing the model as a script. The result is just a basic plane shape, and it certainly took a lot of guidance on [Gil]’s part to make it happen, but it’s not bad for a tool that can’t see what it is doing.

The first step was getting ChatGPT to create code for a 10 mm cube, and plug that in FreeCAD to see the results. After that basic workflow was shown to work, [Gil] asked it to create a simple airplane shape. The resulting code had objects for wing, fuselage, and tail, but that’s about all that could be said because the result was almost — but not quite — completely unlike a plane. Not an encouraging start, but at least the basic building blocks were there. Continue reading “ChatGPT Makes A 3D Model: The Secret Ingredient? Much Patience”

Chatting With Local AI Moves Directly In-Browser, Thanks To Web LLM

Large Language Models (LLM) are at the heart of natural-language AI tools like ChatGPT, and Web LLM shows it is now possible to run an LLM directly in a browser. Just to be clear, this is not a browser front end talking via API to some server-side application. This is a client-side LLM running entirely in the browser.

The ability to run an LLM (natural language AI) directly in-browser means more ways to implement local AI while enjoying GPU acceleration via WebGPU.

Running an AI system like an LLM locally usually leverages the computational abilities of a graphics card (GPU) to accelerate performance. This is true when running an image-generating AI system like Stable Diffusion, and it’s also true when implementing a local copy of an LLM like Vicuna (which happens to be the model implemented by Web LLM.) The thing that made Web LLM possible is WebGPU, whose release we covered just last month.

WebGPU provides a way for an in-browser application to talk to a local GPU directly, and it sure didn’t take long for someone to get the idea of using that to get a local LLM to run entirely within the browser, complete with GPU acceleration. This approach isn’t just limited to language models, either. The same method has been applied to successfully create Web Stable Diffusion as well.

It’s a fascinating (and fast) development that opens up new possibilities and, hopefully, gives people some new ideas. Check out Web LLM’s GitHub repository for a closer look, as well as access to an online demo.