Bitwise AND in Sql Server

11,883

Solution 1

Do it in SQL, it will be only 100 times faster than doing it in C# or other front end.

Use the built-in numbers table to break the long string into positions (number series goes up to 2047).

Sample tables

create table users (userid int)
insert users select 1 union all select 2

create table permission (userid int, bigstr varchar(1000))
insert permission
select 1, REPLICATE('0', 56) + '1' -- 57th
        + REPLICATE('0', 32) + '1' -- 90th
        + REPLICATE('0', 64) + '1' -- 155th
        + REPLICATE('0', 845)
insert permission
select 2, REPLICATE('0', 66) + '1' -- 67th
        + REPLICATE('0', 98) + '1' -- 166th
        + REPLICATE('0', 657) + '1' -- 824th
        + REPLICATE('0', 176)

Sample showing all the matching permissions against a list

select *
from users u
inner join permission p on p.userid=u.userid
inner join master..spt_values v on v.type='p'
  and SUBSTRING(p.bigstr,v.number,1) = '1'
  and v.number between 1 and LEN(p.bigstr)  -- or 1000 if it is always 1000
where v.number in (57,90,824)

To find users who have at access to at least one branch in the list:

select distinct u.userid
from users u
inner join permission p on p.userid=u.userid
inner join master..spt_values v on v.type='p'
  and SUBSTRING(p.bigstr,v.number,1) = '1'
  and v.number between 1 and LEN(p.bigstr)  -- or 1000 if it is always 1000
where v.number in (57,90,824)

etc..

Solution 2

You might want to construct strings of the form __1_1____1% for a LIKE query to find all users who have access to branches 3, 5, and 10.

To construct these strings the easiest way is to start with a string of _ characters that's as long as the largest branch number in your set (or larger), and then replace individual _ characters with 1 characters, and then append the % at the end.

As for whether this is faster than doing a loop in the database or a loop in your application, I think your best approach is to just test it.

Solution 3

Bitwise Group Membership:

From the comments, I'm assuming that we are not able to use a link table for group memberships. Here is a bitwise solution that does not use strings. It cannot be an acceptable answer because the number of bits limits the number of groups quite severely. However, by using integers with explicit value comparisons, the database can make use of its indexes efficiently. So I've added it for the case where the number of groups / roles / whatever is limited enough to fit.
PS: Excuse an binary-decimal mess ups, I just plugged stuff in on the fly. Feel free to comment and correct, if I have any errors.

Each group is assigned a bit:

G1: 0001
G2: 0010
G3: 0100
G4: 1000

Users' group memberships are calculated with bitwise &. Here are some examples with the binary and decimal equivalents:

U1: G1:             0001 (01)
U2: G2:             0010 (02)
U3: G3:             0100 (04)
U4: G4:             1000 (08)
U5: G1 & G2:        0011 (03)
U6: G2 & G3:        0110 (06)
U7: G1 & G3:        0101 (05)
U8: G2 & G4:        1010 (10)
U9: G1 & G2 & G4:   1011 (11)

Now, calculate, using iteration from 1-N (N is number of groups) and get a list of all the possible integer values that any particular group can contribute to. For example, G1 will be present in any odd number:

G1' : 0001 (01), 0011 (03), 0101 (05), 0111 (07), 1001 (09), 1011 (11), 1101 (13), 1111 (15)
G2' : 0010 (02), 0011 (03), 0110 (06), 0111 (07), 1010 (10), 1011 (11), 1110 (14), 1111 (15)
G3' : 0100 (04), 0101 (05), 0110 (06), 0111 (07), 1100 (12), 1101 (13), 1110 (14), 1111 (15)
G4' : 1000 (08), 1001 (09), 1010 (10), 1011 (11), 1100 (12), 1101 (13), 1110 (14), 1111 (15)

You can do this with a loop from 1-1000 with a bitwise AND of the group's decimal value 1,2,4,8, etc.
Keep the values in memory, or push them into the table storing your groups' possible memerships e.g. possible_memberships.

Get me users in G1:
   Q: select * from users where group_memberships in (1, 3, 5, 7, 9, 11, 13, 15);
   A: U1, U5, U7, U9

Get me users in G2:
   Q: select * from users where group_memberships in (2, 3, 6, 7, 10, 11, 14, 15);
   A: U2, U5, U6, U8, U9

If you have a groups table with column 'possible_memberships', you can put the values in there, saving you from having to send all the values over wire and allowing the subselect to be cached on the database. p>

Get me users in G3:
   Q: select * from users where group_memberships in (select possible_memberships from groups where name = 'G3');
   A: U3, U7, U6

Solution 4

Use a LIKE query. In Sql Server, an _ used in a LIKE expression matches any single character. To get those users that are in branches 1,5, and 10 , you could to it like this:

SELECT columns FROM Users WHERE BRANCHES LIKE '1___1____1%'

This isn't particularly efficient (it's not very sargable), but it should work and it's likely no worse than your udf option.

View more solutions

11,883

r_honey

Updated on June 04, 2022

Comments

r_honey almost 2 years

I have a very typical situation. We have a table called Users which has a column called Branches (varchar 1000).

The organization can have 1000 branches. So if a user has access to branch 1, 5, and 10, the branches string would look like:

1000100001000000000......

(i.e. 1 for a position a User has branch access to based on the branch's number). Please do not advise better data storage options, this is coming to me from a legacy application that is deployed across continents.

Now given this background (and considering that there can be > 10000 users), I want to search for all Users who have access to any one of given set of branches, e.g. Find all users who have access to either branch 10, 65, 90 or 125.

One easy solution is to convert the desired set of branches (i.e. 10, 65, 90, 125) to a branch string (00000010100 etc), then use a scalar UDF to iterate over both the branch strings and return true at first matching occurence where 2 branch strings have 1, and false if there is not a 1 at common position.

Other than that, I also have an option of searching in application in C#. Some of these users are privileged (approx 1000 or more) and their data is cached in application as it is accessed very frequently. But for other users that are not privileged, data is only in db.

I have 2 questions here: 1) For a db search, is there a better way other than the UDF approach I mentioned. 2) For privileged users, what would be better in terms of performance, search in application (which further can be based on a for loop on branch strings like in UDF, or as a Linq Intersect operator on 2 branch arrays, i.e. a Linq Intersect on [1,5,9,50,80,200] and [6,90,256,300] etc.) Would a db search produce faster results or an application based search?

Consider there might be other parameters for search in both cases, e.g. Last name starts with.

My current approach is to filter rows in db for both situations first on other parameters (like Last name starts with). Then use a scalar UDF to filter this result-set based on branches and then return the results.
- canon about 13 years
  
  What's "typical" about storing relational data in one big concatenated string as opposed to an association table?
- Gabe about 13 years
  
  @antisanity: I'm afraid that this sort of thing is fairly typical. :(
- CobaltBlue over 12 years
  
  @antisanity: I found this question specifically because I've run across this several times and wanted to see how others solved it. It was used alot in the 80's and early-to-mid 90's to pack tons of data into a tiny space. Useful when you don't have 12 gigs of ram or 2TB harddrives. :) Legacy is a tough thing.
Gabe about 13 years

The OP said there can be 1000 branches. Are you suggesting he have a 1000-bit integer computed column?
Abe Miessler about 13 years

@Gabe, I did see that. When I have been told this in the past usually adding something like a calculated field or a view will not be a problem as long as people don't need to change existing processes.
Gabe about 13 years

Sure, maybe he can add a calculated field -- but a 1000-bit one?
Abe Miessler about 13 years

@Gabe, that's true. I didn't take that into consideration. I'm not sure if there is a way around that... @r_honey, take note. This probably won't work for your purposes. I will leave it up incase someone with a smaller number of choices comes across this post.
r_honey about 13 years

Hi Yuliy, this is an interesting solution... I wonder how it would perform. You see I will still need to use a UDF to create a string of the form __1_1__%. Wouldn't it be better than to just use a UDF to iterate over the branch strings and return true or false. I can see though that in this case, only the original search string for branches needs UDF execution a single time that can subsequently be used in a LIKE clause (in contrast to my solution which needs UDF execution on each independent row in the table).
r_honey about 13 years

Hi Gabe, thanks for replying. As you noticed that for a 1000 length bit string, this isn't a feasible solution. At one point, I thought of converting it to a varbinary string, but again Sql Server does not allow Bitwise operators between 2 varbinary values :(
r_honey about 13 years

I really like this solution not in terms of performance but in terms of its innovativeness and adding another thing in my know-how, the master..spt_values table... Thanks Richard...
r_honey about 13 years

BTW Richards, any comments in its perceived performance in relation to other suggested solutions??
RichardTheKiwi about 13 years

while this would be a good solution for searching for ALL matches, how many LIKE expressions do you want for ANY matches? I read the question as ANY any one of given set of branches. Say it is only varchar(10) and your set is 2,3,5, are you proposing like _1________ or .. like __1_______ or .. like ____1_____ ?
RichardTheKiwi about 13 years

@r_honey don't quote me, but it should perform quite well. This will perform in (< k x n) time, where k is the number of branches to look for, and n is the size of users table. And many substring operations, each looking exactly at one position. If you provided more details (k and n), and I had some time, I could cook up some performance tests.
Yuliy about 13 years

@r_honey: I'd create such a string in your application code, rather than a UDF, as it would be a much more natural environment for doing a bunch of string manipulation.
r_honey about 13 years

k (number of branches) is 1000 as I mentioned and n (size of users table) could be pretty large. 10000 can be considered as the minimum but it could be as large as 1 million or maybe even more.
r_honey about 13 years

@Richard: Thanks for pointing this out. I misread the fact that it would match ALL branches and not ANY. @Yuliy: It doesn't really matter where I create this string, in UDF or in application, but as the stored procedure that actually performs the search is being called from many places, I think it would be better for me to use a UDF for avoiding having to change application code.
r_honey over 11 years

Hi Jimmy, thanks for taking time to reply. With a 1000 length string and trying to split it across 16 sets (for a bigint with 64 bits each except the last set) and then joining the result of those sets (in essence map-reduce :)) would be more clumsy I believe than Richard's reply above which has already been accepted as the answer ...
r_honey over 8 years

Nice alternate when feasible Derek, thanks for sharing!!