Multi-threaded VBA

| 14 Comments | 0 TrackBacks

VBA is a very capable programming language. With it we can craft extraordinary projects in Excel. However, it can only do one thing at a time. We can let it breath a little and jump between events with the DoEvents command, but I'm pretty sure that VBA will always be a single threaded affair, if for no other reason than complexity. Debugging multi-threaded programs in languages that truly suport threads, like C++, Java and C# is very difficult.

threads.png

A few months ago I penned a guest article for Ajay over at Databison, demonstrating a technique I've used many times to simulate multi-threaded VBA. Today I thought I'd share the technique with my readers and include an enhanced example workbook.

 
I should make it clear that this technique is not multi-threaded VBA, as such does not exist. Instead it is a method of orchestrating multiple processes from Excel, leveraging the fact that Windows is a multitasking operating system.

The scenario is scraping a website, an ideal candidate for this demonstration because it can be an asynchronous task. If you follow a traditional approach you might automate Internet Explorer from VBA. This is probably the worst choice because Internet Explorer is big, fat, and slow, but even worse, Excel's VBA processor will sit waiting synchronously for the information from Internet Explorer, virtually killing the Excel user interface while the program is running.

A better technique is to forgo automating Internet Explorer and to use the MSXML2.XMLHTTP OLE object from Microsoft. While not strictly correct, you can think of it as a web browser with no interface. In other words, it's invisible. It's stripped of everything and just quickly retrieves web page text to memory. It can also send data to a server and is optimized for XML. This technique is much faster than the first. But it still ties up Excel synchronously as it scrapes one record at a time.

My preferred approach is to create and launch a swarm of independent agents from VBA. These agents do not run in Excel. They are not VBA. They are in fact VBScript programs that run in the Windows Scripting Host. I think of them as worker bees, because they are expendable. Each has one mission: go to the website of interest; get one record; write it back to the master Excel file that created it; die. A shotgun blast would be another analogy. After the payload has been received by Excel, a new agent is created and dispatched to replace the one that just terminated. This is simpler to manage than trying to reuse an existing agent.

The swarm method works very well and the speed increase is phenomenal, but dependent on a number of factors. The website being scraped is a big factor. In this demo, we'll be scraping www.realestateabc.com, which is really slow. Several years ago before Zillow.com went to a Flash interface, this technique was unbelievably fast there. The operating system is another factor. I can run several dozen agents in a swarm on Windows XP. The same computer can run hundreds of agents in a swarm under Vista. Windows 7 is likely better than Vista, but I don't have it and so I don't know. But the more agents you can run simultaneously, the better the speed increase. I can only assume that Vista has a better OLE subsystem than XP. And of course the speed of your machine and the amount of memory it has will also affect the size of the swarm and its performance.

The example workbook showcases all three of these techniques allowing you to test the speed of each. I've also included an interesting chart to monitor progress and speed. It's best to only run one instance of Excel during the testing. The first time you set the swarm loose, you may need to authorize each VBScript file, depending on your security settings.

Many of my examples on this site do not use VBA, or they use minimal VBA. This example is heavy on VBA, but interestingly most of the work is done by the VBScript agents when using the swarm. Of course when using the other two techniques, VBA is the workhorse.

For an added bit of enlightenment, run the swarm with the Windows Task Manager open to Processes, and the Image Name column sorted Z-A. You'll see all the agent programs as they are spawned, and then near the end of the run, you'll see them disappear.

This workbook is an example of fairly advanced Excel. The idea of Excel launching a swarm of programs that run in a different language and ultimately each write back to the Excel workbook that created them is... out there! Even the chart inclusion that monitors the progress and speed is a sophisticated specimen. I think that most users could learn a tremendous amount by fully studying this example workbook.

Please let me know what you think. If you have any questions on the methodology or the VBA or VBScript coding, please fire away.

Now imagine you had to scrape a hundred thousand records...

Here's the file:



--- UPDATE ---

There's an interesting enhancement to this sample workbook here.
Enhanced by Zemanta
If you liked this article, please share it!



Your Ad Here

No TrackBacks

TrackBack URL: http://www.excelhero.com/cgi-bin/mt/mt-tb.cgi/37

14 Comments

Hi Daniel,

You amaze me everyweek and I spend weekend time trying to understand. Thank you very much for all the good stuff and I think this will take more than weekend time ( I like it).

Regards,
Techcirle

@techcircle -

You are most welcome. If you need any clarification, I'm here to help!

Regards,

Daniel Ferry
excelhero.com

Hi,
That simply blown my mind. I've automated IE once with the slowest method you described and I thought that was pretty good, but the swarm is just amazing.

That being said, I'm not that familiar with VBscript (I mean not at all). Can you point me to a good place to start and tell me what's the general use of VBscript?

Regards

@PAT -

Welcome to my blog.

I'm glad you liked the swarm!

To learn VBScript, a great place to start is:

http://www.visualbasicscript.com/

VBScript has three major uses.

It can be imbedded in an HTML webpage as a sort of analog to JavaScript. Microsoft developed Internet Explorer early on to use VBScript natively for essentially any task that a website developer might do in JavaScript, such as AJAX, animation, etc. The only problem with using VBScript inside of a webpage to control that page is that the only browser that supports it is Internet Explorer; all other browsers will just ignore the VBScript. In this scenario, VBScript is being interpreted and run by Internet Explorer.

The second major use is as the native language used in Active Server Pages (ASP), which was Microsoft’s first server-side script engine for dynamically-generated web pages. Entire web applications can be developed using ASP Classic and VBScript, and millions have been. In this scenario, VBScript is being interpreted and run by Internet Information Services (IIS), Microsoft’s web server.

The third major use, and the scenario used in this article, is to run VBScript in the Windows OS subsystem called the Windows Scripting Host. In effect, this allows VBScript to act as a modern day batch file on steroids. IT administrators often use VBScript to automate management of company workstations, including the company-wide deployment of updates and new programs, backups, diagnostics, etc. In this scenario, VBScript is being interpreted and run by the Windows Scripting Host on any modern Windows computer.

VBScript is a subset of VB6 and VBA. One important difference is that VBScript does not support variable typing, so all variables in VBScript are variants.

Regards,

Daniel Ferry
excelhero.com

Hi Daniel,
This is one of the most fascinating blog I have seen on Excel. Excel has never been this fun before.

Thanks again & Keep up the great work you are doing.
Kedar

@Kedar -

You are most welcome.

I'm trying to keep the content totally original and interesting...

Regards,

Daniel Ferry
excelhero.com

Thanks, so writing vbscript seems similar to VBA, just that it's not executed by Excel but by Windows (to put simply).

From an Excel user point of view, the point is to interact with other software without stalling Excel, right?

regards

@Pat -

You are on the right track.

The point is to allow Excel to act as if the VBA processor was mutli-threaded. If it really were, we could write programs that acted like the swarm natively. It's not so we have to get creative if we want the improvements in speed that multiple threads bring when working on asynchronous tasks. Instead of multiple threads, we are employing multiple processes. I think this scraping demo does a pretty good job of showcasing the improvement possible when marshalling the multitasking Windows OS. Excel just sits there and allows all the extraneous programs (the swarm) to deposit their payloads of data. All Excel does is create the agent from VBA, dispatch it, and update the visualization. And so yes, Excel is not stalled as you put it. The speed increase comes form the fact that many programs are scraping the website and not just one.

Regards,

Daniel Ferry
excelhero.com

Hi Daniel,

What's that all about!? Man oh man! Please, keep on doin' this kind of thing!

Best Oustanding excel blog ever seen!

Regards!

@Luigi -

I certainly will. I love sharing and am trying to find more time to do it.

Thank you for the compliments and welcome.

Regards,

Daniel Ferry
excelhero.com

This is really great. Just a couple of questions that occurred to me while studying the swarm thing:

1. In the .vbs files, after doing the oXML.send, it seems you just sleep for a fixed time rather than waiting for the READYSTATE_COMPLETE as in the MSXML2 method in the workbook. Any specific reason for this?

2. When you
Set oXL = GetObject(, "Excel.Application")
-- is there a risk of the script finding the wrong Excel instance if there are several Excels running?

This really is a wonderful resource you're supplying and you also provide without cost. I enjoy seeing web sites that have an understanding of the worth of providing a excellent source totally free. Regards!

Excel is an exceptional program! It comes in so useful for almost any industry. The limitless cells and formulas are great for private lending institutions to organize data and information. Anything that works with excel is an awesome software option to look into, especially if it matches the quality of excel.

Hi Daniel,

your insights have been really interesting to me. Inspired by your findings at first I thought of using VBscript workers for multithreading other algorithms but it seems VBscript is much too slow. That's when I developed a hybrid method using VBA workers invoked via VBscript. See more on this here:

http://blog.tkacprow.pl/excel-multithreading-vba-with-vba-worker-threads/

This is just 1 of 3 methods to multithreading in VBA which I discovered. In order to make a final showdown I compared the performance of all threads methods here:

http://blog.tkacprow.pl/excel-multithreading-vba-vs-vbscript-vs-c-net/

Leave a comment

About this Entry

This page contains a single entry by Daniel Ferry published on May 1, 2010 12:19 PM.

Excel Optical Illusions #11 was the previous entry in this blog.

Multi-threaded VBA Update is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.