Using find with a struct

56,708

Solution 1

The syntax Structure.b for an array of structs gives you a comma-separated list, so you'll have to concatenate them all (for instance, using brackets []) in order to obtain a vector:

find([Structure.b] == 6)

For the input shown above, the result is as expected:

ans =
     2     3

As Jonas noted, this would work only if there are no fields containing empty matrices, because empty matrices will not be reflected in the concatenation result.

Handling structs with empty fields

If you suspect that these fields may contain empty matrices, either convert them to NaNs (if possible...) or consider using one of the safer solutions suggested by Rody.

In addition, I've thought of another interesting workaround for this using strings. We can concatenate everything into a delimited string to keep the information about empty fields, and then tokenize it back (this, in my humble opinion, is easier to be done in MATLAB than handle numerical values stored in cells).

Inspired by Jonas' comment, we can convert empty fields to NaNs like so:

str = sprintf('%f,', Structure.b)
B = textscan(str, '%f', 'delimiter', ',', 'EmptyValue', NaN)

and this allows you to apply find on the contents of B:

find(B{:} == 6)

ans =
     2
     3

Solution 2

Building on EitanT's answer with Jonas' comment, a safer way could be

>> S(1).a = 7;
   S(1).b = 3;
   S(2).a = 2;
   S(2).b = 6;
   S(3).a = 1;
   S(3).b = [];
   S(4).a = 1;
   S(4).b = 6;

>> find( cellfun(@(x)isequal(x,6),{S.b}) )
ans =
     2     4

It's probably not very fast though (compared to EitanT's version), so only use this when needed.

Solution 3

Another answer to this question! This time, we'll compare the performance of the following 4 methods:

  1. My original method
  2. EitanT's original method (which does not handle emtpies)
  3. EitanT's improved method using strings
  4. A new method: a simple for-loop
  5. Another new method: a vectorized, emtpy-safe version

Test code:

% Set up test
N = 1e5;

S(N).b = [];
for ii = 1:N
    S(ii).b = randi(6); end

% Rody Oldenhuis 1
tic
sol1 = find( cellfun(@(x)isequal(x,6),{S.b}) );
toc

% EitanT 1
tic
sol2 = find([S.b] == 6);
toc

% EitanT 2
tic
str = sprintf('%f,', S.b);
values = textscan(str, '%f', 'delimiter', ',', 'EmptyValue', NaN);
sol3 = find(values{:} == 6);
toc


% Rody Oldenhuis 2
tic
ids = false(N,1);
for ii = 1:N
    ids(ii) = isequal(S(ii).b, 6);
end
sol4 = find(ids);
toc

% Rody Oldenhuis 3
tic
idx = false(size(S));
SS = {S.b};
inds = ~cellfun('isempty', SS);
idx(inds) = [SS{inds}]==6;
sol5 = find(idx);
toc

% make sure they are all equal
all(sol1(:)==sol2(:))
all(sol1(:)==sol3(:))
all(sol1(:)==sol4(:))
all(sol1(:)==sol5(:))

Results on my machine at work (AMD A6-3650 APU (4 cores), 4GB RAM, Windows 7 64 bit):

Elapsed time is 28.990076 seconds. % Rody Oldenhuis 1 (cellfun)
Elapsed time is 0.119165 seconds.  % EitanT 1 (no empties)
Elapsed time is 22.430720 seconds. % EitanT 2 (string manipulation)
Elapsed time is 0.706631 seconds.  % Rody Oldenhuis 2 (loop)
Elapsed time is 0.207165 seconds.  % Rody Oldenhuis 3 (vectorized)

ans =
     1
ans =
     1
ans =
     1
ans =
     1

On my Homebox (AMD Phenom(tm) II X6 1100T (6 cores), 16GB RAM, Ubuntu64 12.10):

Elapsed time is 0.572098 seconds.  % cellfun
Elapsed time is 0.119557 seconds.  % no emtpties
Elapsed time is 0.220903 seconds.  % string manipulation
Elapsed time is 0.107345 seconds.  % loop
Elapsed time is 0.180842 seconds.  % cellfun-with-string

Gotta love that JIT :)

and wow...anyone know why the two systems behave so differently?

Also, little known fact -- cellfun with one of the possible string arguments is incredibly fast (which goes to show how much overhead anonymous functions require...).

Still, if you can be absolutely sure there are no empties, go for EitanT's original answer; that's what Matlab is for. If you can't be sure, just go for the loop.

Share:
56,708
CaptainProg
Author by

CaptainProg

Updated on January 25, 2020

Comments

  • CaptainProg
    CaptainProg over 4 years

    I have a struct that holds thousands of samples of data. Each data point contains multiple objects. For example:

    Structure(1).a = 7
    Structure(1).b = 3
    Structure(2).a = 2
    Structure(2).b = 6
    Structure(3).a = 1
    Structure(3).b = 6
    ...
    ... (thousands more)
    ...
    Structure(2345).a = 4
    Structure(2345).b = 9
    

    ... and so on.

    If I wanted to find the index number of all the '.b' objects containing the number 6, I would have expected the following function would do the trick:

    find(Structure.b == 6)
    

    ... and I would expect the answer to contain '2' and '3' (for the input shown above).

    However, this doesn't work. What is the correct syntax and/or could I be arranging my data in a more logical way in the first place?