Excel Partial Match Database Lookup

| 32 Comments | 0 TrackBacks

Here is a technique that I've used countless times when crafting an Excel database project for clients. The goal is to show all records that partially match a lookup string.

partial_match.gif
In the sample file, there are 10,000 records and as you can see the partial match lookup is very quick. It supports wildcard characters.

 
While this is entirely an Excel application, what we are doing here is equivalent to a traditional database SQL statement of:

SELECT *
FROM Data
WHERE (((Data.Description) Like "*search_string*"));

So how do we do this in Excel. It could be done using just formulas, but this is a VBA project. It leverages the brilliant FindAll() function by Chip Pearson. All of the code is in the Sheet2 code module. Have a look. It's succinct, potent.

The file works in all versions of Excel from 2000 forward.

So tell me how you do partial match lookups?

Here is the file.




Please remember to sign-up for the free Excel Hero Newsletter!



--- UPDATE ---

As we have seen many times before,  Excel always has multiple ways of doing things. Sam sent me an awesome tip. I revised the workbook to use the tip and it is now much faster and the code is better too. Chip's FindAll() function is still brilliant, but it was not needed here, so I've removed it in the new version. It is useful for many, many things and I'll write about it often. Sam's tip was to use the AdvancedFilter method of the Range object. The previous version was more than adequate as long as the search string was three or more characters long. Now it's instant even if one character long. In fact, it's instant even if zero characters long!

Here it is.





--- UPDATE #2  ---

As promised in the comments, here is the ADO way of doing the Partial Match Database Lookup. I have taken Nikolai's example and reworked the code somewhat to conform to my style of programming. Using ADO is very, very powerful, as it unlocks the entire universe of real databases, such as SQL Server, Oracle, MySQL, Access, and countless others. Virtually all Windows computers have ADO installed and ADO can read Excel files even if the machine does not have Excel installed! Of course for our purpose, we will be using it from Excel so that is beside the point. But you see, ADO, is a separate library that dozens of programming languages can use. So to use it from Excel we need to reference the ActiveX Data Objects Library from the Tools-References menu of the VBA Editor.

Over the coming months I will share many applications that use ADO to interact with databases. But there is one hiccup that you need to know about. If you are using it to interact with Excel data in the same workbook where you are using ADO, there is a problem. Each time you call the ADO library, a bit of memory will be used up on your computer and you will never get it back until you reboot your machine. This is called a memory leak, and it is poor programming practice. This particular memory leak is a recognized bug and Microsoft has published a KnowledgeBase article about it on their support website:


So you should never do this. However, for instructional purposes I have setup Nikolai's workbook with a check box that allows you to toggle between using local data, and having the VBA code create a copy of the data in a new workbook and accessing that instead (which is the correct way to do it). To see the memory leak in action, just open the Windows Task Manager to the Processes tab while you have the workbook open. Sort the list by Image Name. Find the Excel.EXE process and select it with the mouse. Now start using the Partial Match Database Lookup with ADO workbook to do searches. You will notice if you  have the External Data Method check box checked that the memory will temporarily increase for the Excel.EXE process as ADO is invoked; but you will also notice that the memory is quickly given back to the Windows operating system. This is good and the way things are supposed to work.

Now un-check the External Data Method check box in the workbook and do some searches. You will notice that each time a search is done the memory spikes, but that memory is NEVER released back to Windows. This is the memory leak. If you keep using ADO in this fashion eventually you will run out of memory and your computer will crash. The only way to get that memory back is a reboot.

So this file is interesting and instructive. It shows you how to use a very powerful library to access data, and it demonstrates the correct and incorrect way of doing so, allowing you to toggle back and forth between the two.

Thank you Nikolai for the file. He runs a fantastic website on chess, so please check it out:


Here is his modified Parial Match Database Lookup file that uses ADO:







Enhanced by Zemanta
If you liked this article, please share it!



Your Ad Here

No TrackBacks

TrackBack URL: http://www.excelhero.com/cgi-bin/mt/mt-tb.cgi/63

32 Comments

Daniel,

You can do this with one line of code

SorSht.Range("Data").AdvancedFilter Action:=xlFilterCopy, CriteriaRange:=DestSht.Range("mCriteria"), CopyToRange:=CrtSht.Range("mTarget"), Unique:=False

Also the current method is very slow - Just type "a" in the search box and see the time it takes

I have sent you a file - its pretty fast

@Sam -

Well there you go. Two decades of work and never used the AdvancedFilter method like this.

I've made such extensive use of Chip's FindAll in so many circumstances over the years that I can see it has limited me to other possibilities. There are in fact many situations where it is invaluable and the AdvancedFilter would not work, or work well, but in this case it is apparent that it works better. However, in practice a single character search is not likely - at least not where I've deployed this before.

Excel is amazing and no matter who you are, there is more to learn!

Thank you very much! And great job.

I will tweak it a little to my style of programming and post it.

Regards,
Daniel Ferry
excelhero.com

Thank you Daniel and Sam, this is really useful.
Do you think that this default Excel autocomplete behavior (when typing in one column) could be added to the search_string cell ? Or would it need employing Userform with a ComboBox to simulate it?

Hi Daniel,
As a fan of yours, I’m already used to the amazing creativity of your solutions and the high quality of the resulting apps. The only downside is that we might become addicts and expect nothing short of miracles with every new entry of yours.

For the mere sake of quality I’d like to share some thoughts about today’s topic. I do realize that your approach (and especially the optimized version) gets the job done and many Excel fans will be quite satisfied with the result. My problem is that data manipulations like those tend to push Excel’s envelop a bit too far and probably belong to a completely separate domain (ie databases and SQL). Besides, by including the sample SQL statement you most likely had similar thoughts of yours. But then you asked “How do we do this in Excel?” and it sounded as if there was no point trying to do it the right way using SQL.

In fact, combining the power of SQL and Excel is much simpler than what is widely expected. I’ve re-written the solution using ADO and the, so called, “Jet Engine” (think Access). There aren’t any special requirements or (God forbid) installations. The only trick is to include a new reference (via Tools/References in Office 2003) to the ADO object library. I use “Microsoft ActiveX Data Objects 2.5” but any newer version will work as well.

Everything resides in the same workbook you provided in first place. The essential procedure is about 10 lines of VBA code which makes it a lot shorter than your initial solution and roughly on-par with the optimized version. Needless to say, the power of SQL and the range of the new possibilities that it brings to the table creates a brand new platform of its own. I’m sending you an e-mail with the workbook so that you can attach it here if you want.

@Nikolai -

Firstly, let me say I appreciate your comment and your support of my blog.

There is a method to the order in which I share apps. If you look at the first couple of months there was no VBA at all. Slowly I have been adding more and more. The reason is that many people are scared to death of VBA.

I actually had on the schedule a revised version of the Partial Match Lookup using ADO. My plan was to first show how it can be done natively, and then introduce the concept of external libraries.

No, it is not a big deal, but my incremental plan has been very successful thus far, as my readership continues to expand quite rapidly. I've been coding SQL solutions for 25 years with all manner of databases. I've written almost a hundred production applications with RDO and ADO. My skills in Access development are on par with my Excel abilities, and I consider myself a query/stored procedure expert. I also specialize in designing and coding multi-tier, decision support systems with Excel as the front end.

I really do appreciate your input. I just was not going to go there just yet. However, my goal is to build a community here and so I want to encourage that and will publish your version in another update tomorrow. I will tweak it a little so that it conforms to the style of coding that I've been promoting.

Thanks again!

Regards,
Daniel Ferry
excelhero.com

@Nikolai
Without looking at your code...If everything resides in the same workbook then I would not use SQL to query the same file..It would cause a memory leak and ghost instances of the project in the VBA

This is a very cool, useful application for Excel. I can't think of anything off hand that I could use it for, but I know there will be an opportunity sooner or later!

Hi,

Sorry to drop this on you out of the blue however this problem has been bugging me for a bit.
I was directed here via 'ExcelForum' and have to say I am impressed by what I see.

On to my problem.

I have a spreadsheet of data that is about 150000 rows. Within this is lots of different types of data such as names, addresses, emails and telephone numbers.

What I am trying to do is work out a way to search for data that meets a specific criteria. By this what I mean is that rather than searching for the string 01132334587 I would instead ask Excel to show me all instances of a string of digits that is 11 integers long.

My problem is that i'm unsure how to go about this.
Do you have any starting points?

People have suggested the VLOOKUP function but from what I can see this requires you to know what string of data you are searching for rather than what I am after.

Many thanks for any assistance/advice you can give!

Jinga

@Jinga -

Welcome to my blog!

What you want to do is by no means trivial. The partial match examples here will not work for you as they require a known string to do the match from.

I believe that your best bet is to use Regular Expressions (RegEx, for short). There is a library that you can reference to allow for RegEx pattern matching from within Excel VBA.

About five years ago there was a post by Rob van Gelder on Daily Dose of Excel that showed how to make a User Defined Function in VBA so that RegEx expressions could be used from worksheet cells:

http://www.dailydoseofexcel.com/archives/2005/08/13/pattern-matching/

With RegEx expressions you should be able to find all of the data patterns you could ever think of.

I might even get around to working up an article for Excel Hero on RegEx. In the meantime please sign-up for the Excel Hero Newsletter so you can automatically stay abreast of all developments here!

Regards,

Daniel Ferry
excelhero.com

@Pawel -

Yes, unfortunately the auto-complete feature of a cell is not controllable like that. As far as I know you would need an ActiveX combo box to add that functionality.

Regards,

Daniel Ferry
excelhero.com

@Jinga
The Partical Match will work for what you require
The Advance filter does accept 2 wild cards like Autofilter.

So in the criteria range if you type in '=????
(note the single quote before the equal to)
you will get part numbers that are 4 characters long

RegEx would be necessary for something far more complex

@Daniel-
While the best way is to keep the data in a separate file/database, some times for small projects one might want to keep the data and the results in the same file.
In such cases it is better to use the good old DAO rather than ADO
This has the advantages of letting one create complex queries using SQL statements without the problems of memory leaks.

This also means you avoid the step of creating and deleting a temporary file. Its also faster..

I have sent you a file - its slightly faster

this is amazing

Hi Daniel,

What an excellent blog. My questiont regards excel 2007. Is there a way to use tables instead of [Database] reference? I like using the tables in general and thus most of my templates utilize them. Anyway, I couldn't get the table reference to work (I tried also ListObjects("MyTableName").DataBodyRange).

Teemu

Hello Daniel,

For partial match lookups I've been using the SEARCH function in an extra column in the data sheet and Autofilter with the Criteria =">#VALUE!" to hide non-matching lines.

Jakob.

@Daniel,
Thanks for including my suggestion. Hands down, you’re the host and you should follow your plan to gradually increase the topics’ scope and complexity. I quite agree that using ADO isn’t a small step to make for an average user. Keeping in mind that Excel is so powerful an environment by itself most of the time we’re able to find an adequate solution within its own functionality (eg using AdvancedFilter in our current example). I was only tempted to suggest ADO because, in this particular case, it allows an elegant solution that matches the quality of everything else we can find here. The extra point is that people generally (and wrongly) perceive ADO as being too difficult which is quite subjective. The matter of principle is that whenever it comes to database applications there will always be a delicate boundary beyond which we should prefer using well established DB approaches. Not only would they open new possibilities but they will free us from re-inventing the wheel feeling. Yes, some extra effort is required to get to know SQL but that is well justified by broadening our perspective.

@Sam, Daniel,
Turning a piece of code into a final (suitable for production) solution does involve paying attention to details like those. In my mind, people are often deterred by the additional effort it takes. The bug you mentioned seems only relevant to older versions of Excel. Yes, we can witness the excessive memory usage but that appears to be a fragmentation rather than a memory leak. Closing Excel, with no reboot whatsoever, resolves it on my machine. Moreover, the underlying reason seems to be the numerous creations of new objects of ADODB.Connection and ADODB.Recordset type. If those two are made public and created only once I can hardly notice any unusual memory usage even when querying the same workbook. No doubt, a better application design would be to have the data separate into its own workbook.

@Nikolai -

"If those two are made public....even when querying the same workbook."

This is simply brilliant.....I tried this and you are right,it does not cause a memory leak/Ghost Instances of the project in VBE.

Tested on 2003/2007/2010 running on Win Vista

Thanks for sharing.

Ok, now that the academic discussion is over... How about a little help for the amateurs.

On the Optimized version using the Advanced Filter, could you explain the Search lines a little bit more in detail, please? Specifically...

[filter_criteria] = .Transpose(Array(CRITERIA_HEADER, "*" & [search_string] & "*"))


I'm not understanding the abstractness of this line of code at all.

Thanks for any help you can offer!
Mark

@Sessoms -

There are many more ways to do this, so I would say it's premature to call the discussion over, but I'd be happy to shed some light on the line of code in question.

When you use the AdvancedFilter method of the of Range object to filter that range you need to provide a rule to filter on - a criteria. The AdvancedFilter method is shockingly fast as we have seen, but one hurdle is that you must supply the criteria information in a range (very similar to the way this is done with the database worksheet functions like DGET and DSUM). If you are unfamiliar with that method, in its simplest form basically you need to have a range consisting of two cells, one directly below the other. The top cell of that range will identity which column of the range being sorted to use and the bottom cell will provide a value.

In my implementation of Sam's suggestion to use the AdvancedFilter, I defined a two-cell range on the Data worksheet at Z1:Z2. Most of the time the values in those two cells are blank. But when the search is done, my code temporarily provides the criteria values in those two cells. The top cell is give the value, "Description" and the bottom cell is give the value of the search string you provided when you initiated the search surrounded by asterisks. The asterisks are wild-card characters that tell the AdvancedFilter to find all occurrences of the search string, even when buried inside other strings in a cell.

I chose the constant CRITERIA_HEADER as a way to describe the value for the top cell, which again is set to "Description" at the top of the procedure. [search_string] is a shortcut reference to the cell where you type what you want to find.

The Array function takes those two values and makes a horizontal array out of them. The Transpose method of the Application object takes that horizontal array and converts it into a vertical array. That array is then written to the two cells that define the criteria by simply using the shortcut reference, [filter_criteria] =.

The very next line of code then runs the filter using that criteria range. And the line after that erases the contents of the criteria range since the filter has already been completed.

I hope that helps!

Regards,
Daniel Ferry
excelhero.com

@Daniel

Ahhh... I didn't realize exactly how the .Transpose(Array(...) line was temporarily populating the [filter_criteria] range. I thought it was something more complicated.

And now that you've gone through it element by element, that makes it uber-clear.

Thanks for taking the time to explain that piece a bit more thoroughly! I'm certain some future readers of the blog will appreciate the clarification.

I am automatically ranking you in my top ten solely based on this thread.
Awesome posts! You just saved me about half a bottle of tylenol, a pint of vodka, and a good cry.
I spent the better part of the day looking for a way to do this and it seems so simple now!

I will give this a try tommorow.

Does this/these method/s easily support multiple criteria? I have a large list of contacts and I would like to be able to further refine the results with multiple criteria.

Thanks again!

I must admit that modifying the code to suite my purposes took less time than a cup of coffee :-)

I am a new loyal follower of your blog.

I am currently working on utilizing radio buttons or check boxes to allow the search to be done on different fields.

I am a bit perplexed on how to set up multiple search boxes for multiple criteria. Also, I would like to see multiple words return values with either/both AND/OR........so "senior center" would return ALL instances of "senior" AND/OR "center".

I will keep you up to date on my progress. You honestly saved me hours of (bloat)coding to achieve the same result.

Shane, any chance you would like to share your revised version?

@Shane,
I’m sure, you already noticed that the AdvancedFilter approach will give you a lot of power to build multi-criteria filters (the reason it’s called advanced). In fact the interface AF implements is something known for ages -- originally called QBE (Query By Example) it is widely used in MS-Access, MS-Query and many others. The truth is that such a GUI appears to be a “SQL free” mechanism which makes it attractive to users who try to avoid using SQL for whatever reason. In reality, under the hood, our QBE criteria are first translated to SQL and then some sort of SQL engine gets the job done. But why am I boring you to death with those details (sorry, Sessoms)? Because, there’s only certain level of query complexity that’s supported by QBE and for anything beyond that you should be ready to use real SQL. After that, you will be able to do amazingly powerful filtering bordering unbelievable.

@Daniel,
Thank to Sessoms’s remark above I remembered to ask a question of mine related to your very first solution. It has to do with Find/FindNext methods pair of Range object. Let’s imagine that we want to use Chip's FindAll() but from another procedure of ours which has a loop, uses Find/FindNext methods of its own (on a different object) and then calls FindAll() (from inside the loop). I clearly remember that nesting the calls like that didn’t work for me in the past. At the time, I concluded that some sort of static context is causing side effects and I gave up on Find/FindNext altogether. Is there something I overlooked?

@Nikolai
"Because, there’s only certain level of query complexity that’s supported by QBE and for anything beyond that you should be ready to use real SQL."

Well while I don't deny the power of SQL, but the AF criteria can be used for something very complex.

The fact that formulas are also allowed in the criteria range of the AF opens up enormous possibilities for complex querying. This helps particularly as Excel have a wide variety of functions compared to SQL.

I have sent Daniel a few examples of AF, may be he will share it on the blog.

i want to become a professional in excel

Very good discussion here.
@Nikolai, @Sam, @Daniel: On filtering possibilities,I've been playing around with a non-VBA approach in MS Query like so: I have many one-column named ranges...one for each criteria that I want to filter on. THese are used as tables in my SQL query to filter my database in the WHERE clause. In the example below, Number_min Number_max and Gender_equals are one-column named ranges containing my criteria.


SELECT distinct data.Number
, data.Gender
, data.Salary
, data.GivenName
, data.MiddleInitial
, data.Surname
, data.StreetAddress
, data.City
, data.State
, data.ZipCode
, data.Country
, data.Birthday
, data.BloodType
, data.Pounds

FROM data, Number_min, Number_max, Gender_equals

Where data.Number >= Number_min.Number_min
AND data.Number AND data.Gender like Gender_equals.Gender


For this, database and criteria need to be in the same workbook, as I don't believe MS query can pull records from two different workbooks natively). And of course this workbook will be open, meaning there will be a memory leak. But this isn't a major problem for me with this data.


I've uploaded an image to http://screencast.com/t/3dhsUyTt

A (slight) disadvantage of this is that changing the criteria doesn’t trigger the query to update…you have to do this manually, or use some VBA. But that’s pretty minor.

I've also been playing around with using parameters in MS query to filter a database, but this can get tricky as these don't accept arrays, so if you want to search say for three different surnames in the data.surname column you need to set up three different parameters.
I've uploaded an image of this implementation to http://screencast.com/t/3I23Fj5K


Further to Nikolai's comment above that The bug you mentioned seems only relevant to older versions of Excel

When I query the open workbook using MSQuery ODBC, in excel 2010 I get no memory leak altogether, but in excel 2003 I get a sizable memory leak.

When I query the open workbook using DAO, in excel 2010 and in excel 2003 I get small (and tolerable) memory leaks.

I wonder what excel 2007 does?

@Jeff.
Re Memory leak:
The memory leak is there once you build/refresh the query via VBA irrespective of the version - but making the ADODB.Connection and ADODB.Recordset variables to Global seemed to resolve the issue of the memory leak as suggested by Nikolai

Re Arrays in Parameters:
You can specify multiple or criteria in parameters.
But you need to use the WHERE IN Clause rather than the WHERE along with multiple OR's

May be this will get covered in the Academy in the Access section...If not I will post a sample file in the forums

@sgbhide reThe memory leak is there once you build/refresh the query via VBA irrespective of the version When I set up a query via MS Query that pulls data from sheet 1 to Sheet 2, here's what the EXCEL.EXE process shows for 10 refreshes:

35108
42128
44636
48084
48072
48068
48088
48072
48068
48088


And when I do the same thing with the modified Parial Match Database Lookup file that uses ADO that Daniel posted above, here's what EXCEL.EXE process shows for 10 refreshes:

68356
68380
69660
68664
69872
68664
68896
68772
68592
68544


That's why I hypothesise that the memory leak bug has been fixed in excel 2010 (and likely excel 2007 as well).

Hello,

This is fantastic! Exactly what I was looking for. I was wondering if it's possible to alter it to not clear the whole sheet when you research. I would like to hide the search results, but use the data in some formulas and display it on the same sheet. For instance, I use the search function to find a bunch of prices. Then I want to display the average price based on those search results. The problem is when I research it doesn't just overlay the results it also deletes all of my formulas. Any help you could provide would be great! Thank you again for this great tool!!

Hello, I came to this site while researching a solution and this looks like an excellent approach. It amazes me that these few lines of code can do such powerful things and you are doing a lot of us a big favor by sharing it.

Correct me if I'm wrong, but this looks like it can be used to create user-defined queries completely in Excel. At least, I'd like to combine this with some dynamic charts to make some useful analytical tools.

Unfortunately, although I know my way around SQL statements I have very little VBA so please excuse me when I ask some basic questions:

1.) How can I return only desired columns from my matches rather than the whole bunch? In your example, suppose I only wanted 'Dealer Price'. (In reality, I have a gynormous excel db where I want to pick and choose only a few fields. Easy as pie in SQL, not sure how to do it here.)

2.) The answer you gave to Sessoms was clear enough to him, but I'm afraid I still don't understand. Where do you specify cells Data!Z1:Z2 as the location for the search string? This whole business of using adjacent cells as search criteria is hard for me to get at. Some additional explanation would be very welcome.

In fact, let me go so far as to encourage you to revisit this subject and give some step-by-step advice on how to implement this. I know that I would be happy to shower you with praise and/or PayPal donations :o)

But whatever you choose to do, thanks for a great piece of code!

Leave a comment

About this Entry

This page contains a single entry by Daniel Ferry published on July 15, 2010 8:35 AM.

Excel Formulas Based Sudoku Solver was the previous entry in this blog.

Excel Optical Illusions Week #22 is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.