Guru: AI Pair Programming In RPG With Continue

March 10, 2025 Gregory Simmons

In my last article, I shared a brief introduction to the GitHub Copilot extensions. These extensions provide an easy way to get up and running with an AI coding assistant to aid you in your RPG development. Being cloud based, it’s lightweight in terms of system resources, but it does cost a little money per month, per user.

For this article, I would like to share with you what I have learned about a newer extension for VS Code named Continue. Continue runs locally on your PC, is 100 percent free, and as of this writing, is the leading open-source AI code assistant.

While at TechXchange this past October, I had the pleasure of meeting Adam Shedivy, a software developer for IBM. He was generous with his time and gave me my first introduction to a self-hosted AI code assistant with the Continue extension.

To get started, you need to download a tool that allows you to run lage language models (LLMs) locally on your computer. There are several, but for better or worse, I chose Ollama. It seems to be pretty popular, can run on Windows, macOS and Linux, and open-source. Anyway, you can download Ollama here: https://ollama.com/download/windows

Then, in VS Code, go to the extensions marketplace, search for, and then install the Continue extension. Next, I installed one of the large language models (LLMs) to use. You can learn about the variances of LLMs within the Qwen 2.5 Coder series of models here: https://ollama.com/library/qwen2.5-coder I’m currently using both the 1.5b and 7b versions. To install the 7b version, I ran this command in a Powershell terminal within VS Code: ollama run qwen2.5-coder:7b

These LLMs can get quite large, so I don’t recommend downloading this while connected via your phone’s hotspot and waiting for a flight in Laguardia airport (guilty). Wait until you’re home or in the office, wherever you’ve got a good snappy connection. Once done, you can double check that the LLM is loaded with the list command:

Next, I added the model into the Continue extension:

On the ‘Add Chat model’ screen, change the provider to Ollama and change the model to autodetect, then click Connect. Then in the dropdown list, you will see an option for autodetect – qwen2.5-coder:7B. In this screen capture, you can see that I have been tinkering with some other models, which have been autodetected as well.

Now, you’re ready to ask the model anything! The Continue extension supports starting a question with ‘@’ and a subject to add context. This can greatly improve the usefulness of the answers the model returns for you. For example, if you want the response to your request to be contextually basted on DB2 SQL, you could start your request with @Db2i. There is a lot to explore here, but perhaps we’ll save that for another time.

The LLM for autocomplete can and probably should be different than the one used in chat. Based on Adam’s insight and confirmed by experimentation, the smaller of the LLMs is quicker to prompt for an autocomplete suggestion. However, the larger of the LLMs can offer a more robust response when chatting with them. And from what I have seen thus far, the 1.5b version of the Qwen2.5-coder prompts with code completion suggestions that are just as good as the other versions. And the speed difference is very noticeable!

To get ‘tab to complete’ functionality similar to GitHub Copilot, you need to edit your Continue config.json file. In VS Code, press F1, then type config.json and press enter. Get a new line and paste this snippet of JSON:

{
  "tabAutocompleteModel": {
    "title": "Qwen2.5-Coder 1.5B", 
    "provider": "ollama",
    "model": "qwen2.5-coder:1.5b"
  }
}

After saving that change, I opened up a source member in my library and code completion suggestions were fairly quick as well as pretty accurate.

A discussion of running LLMs locally on your PC would be incomplete without mentioning performance. When I began exploring running Ollama and LLMs locally, I started out by going for what I perceived as the ‘most powerful’ one within the Qwen offering and installed qwen2.5-coder:32b. My laptop is a Dell Precision 3570 with 12th Gen Intel(R) Core(TM) i7-1255U 1.70 GHz and 64GB of RAM and I never had enough patience to wait for a response from the 32b LLM. I then loaded the 7b LLM and started getting responses, but I wanted to see if we could speed it up.

Anytime you’re talking about performance, you need a ‘measuring stick’. To set the base line and test for an improvement in performance, I returned to my PowerShell terminal within VS Code and instructed Ollama to run the model in verbose mode. Then asked it my test question of ‘How much rain is the equivalent of 6 inches of snow?’

I received an interesting response discussing the density of the snow, weight of water, etc. Since I ran the model with the –verbose switch, I also get some statistics for the performance. The two I focused on where total duration and eval rate:

total duration: 40.2912856s
eval rate: 4.63 tokens/s

Okay, good, that set my base line of measurement. Now, on a fresh install, these LLMs in the Qwen family were using half of the cores available and no more than half of the system RAM. There’s a lengthy discussion about this here: https://github.com/ollama/ollama/issues/2496. But I thought I would try and adjust the num_thread parameter to my number of cores; 10.

Then I asked the exact same question and while the response was a little different, it still talked about weight of rainfall and density of snow, etc. The metrics showed a considerable improvement:

total duration: 27.6182394s
eval rate: 6.82 tokens/s

This gives us a good starting point. We now know how to load and use the Continue extension, how to evaluate the various LLMs, and how to tweak the performance. I encourage you to do your own research on your system, try the different LLMs and find which combination works best for you.

When compared to a paid code assistant, such as CoPilot, Continue is completely free, but does take a little more setup and may also inspire you to upgrade your RAM. One upside, however, for those who are concerned about processing your AI requests on the cloud (as with Copilot), Continue allows you to keep all of your AI requests locally.

Until next time, happy (assisted) coding.

Gregory Simmons is a Project Manager with PC Richard & Son. He started on the IBM i platform in 1994, graduated with a degree in Computer Information Systems in 1997 and has been working on the OS/400 and IBM i platform ever since. He has been a registered instructor with the IBM Academic Initiative since 2007, an IBM Champion and holds a COMMON Application Developer certification. When he’s not trying to figure out how to speed up legacy programs, he enjoys speaking at technical conferences, running, backpacking, hunting, and fishing.

This Issue Sponsored By

Table of Contents

Content archive

Recent Posts

Subscribe

Pages

Search