Find number of concurrent users in a SQL records

10,469

Solution 1

Clearly the number of concurrent users only changes when a user either starts or ends a period, so it is enough to determine the number of concurrent users during starts and ends. So, reusing test data provided by Remus (thank you Remus):

DECLARE @Table TABLE 
(
  UserId int, 
  StartedOn datetime,
  EndedOn datetime
);

insert into @table (UserId, startedOn, EndedOn)
select 1, '2009-7-12 14:01', '2009-7-12 15:01'
union all select 2, '2009-7-12 14:30', '2009-7-12 14:45'
union all select 3, '2009-7-12 14:47', '2009-7-12 15:30'
union all select 4, '2009-7-12 13:01', '2009-7-12 17:01'
union all select 5, '2009-7-12 14:15', '2009-7-12 18:01'
union all select 6, '2009-7-12 11:01', '2009-7-12 19:01'
union all select 1, '2009-7-12 16:07', '2009-7-12 19:01';

SELECT MAX(ConcurrentUsers) FROM(
SELECT COUNT(*) AS ConcurrentUsers FROM @table AS Sessions 
JOIN 
(SELECT DISTINCT StartedOn AS ChangeTime FROM @table
) AS ChangeTimes
ON ChangeTime >= StartedOn AND ChangeTime < EndedOn 
GROUP BY ChangeTime
) AS ConcurrencyAtChangeTimes
-------
5

BTW using DISTINCT per se is not a mistake - only abusing DISTINCT is. DISTINCT is just a tool, using it in this context is perfectly correct.

Edit: I was answering the OP's question: "how one could calculate this using T-SQL only". Note that the question does not mention performance.

If the questions was this: "what is the fastest way to determine maximum concurrency if the data is stored in SQL Server", I would provide a different answer, something like this:

Consider the following alternatives

  1. Write a cursor
  2. Write a CLR cursor
  3. Write a loop on the client
  4. Use an RDBMS with decent cursors, such as Oracle or PostgreSql
  5. For top performance, design your table differently, so that you can retrieve the answer in one index seek. This is what I do in my system if I need to deliver best possible performance.

If the question was "what is the fastest way to determine maximum concurrency using a T-SQL query", I would probably not answer at all. The reason: if I needed really good performance, I would not solve this problem in a T-SQL query.

Solution 2

You can order all events on date order and compute a running aggregate of current users logged in:

DECLARE @Table TABLE 
(
  UserId int, 
  StartedOn datetime,
  EndedOn datetime
);

insert into @table (UserId, startedOn, EndedOn)
select 1, '2009-7-12 14:01', '2009-7-12 15:01'
union all select 2, '2009-7-12 14:30', '2009-7-12 14:45'
union all select 3, '2009-7-12 14:47', '2009-7-12 15:30'
union all select 4, '2009-7-12 13:01', '2009-7-12 17:01'
union all select 5, '2009-7-12 14:15', '2009-7-12 18:01'
union all select 6, '2009-7-12 11:01', '2009-7-12 19:01'
union all select 1, '2009-7-12 16:07', '2009-7-12 19:01';

with cte_all_events as (
select StartedOn as Date
    , +1 as Users
    from @Table
union all 
select EndedOn as Date
    , -1 as Users
    from @Table),
cte_ordered_events as (
select Date
    , Users
    , row_number() over (order by Date asc) as EventId
    from cte_all_events)
, cte_agg_users as (
  select Date
    , Users
    , EventId
    , (select sum(Users) 
        from cte_ordered_events agg
        where agg.EventId <= e.EventId) as AggUsers
    from cte_ordered_events e)
select * from cte_agg_users


2009-07-12 11:01:00.000 1   1   1
2009-07-12 13:01:00.000 1   2   2
2009-07-12 14:01:00.000 1   3   3
2009-07-12 14:15:00.000 1   4   4
2009-07-12 14:30:00.000 1   5   5
2009-07-12 14:45:00.000 -1  6   4
2009-07-12 14:47:00.000 1   7   5
2009-07-12 15:01:00.000 -1  8   4
2009-07-12 15:30:00.000 -1  9   3
2009-07-12 16:07:00.000 1   10  4
2009-07-12 17:01:00.000 -1  11  3
2009-07-12 18:01:00.000 -1  12  2
2009-07-12 19:01:00.000 -1  13  1
2009-07-12 19:01:00.000 -1  14  0

Once you have this in place, finding the number of maximum concurrent sessions is trivial. As you see you have two moments when you had 5 users, at 14:30 (when user 2 logged in) and at 14:47 (when user 3 logged in). Just replace the last query that selects from the CTE to get the actual max:

select top(1) AggUsers 
    from cte_agg_users
    order by AggUsers desc

This solution uses CTEs so it will only work on SQL 2k5, if you're still on SQL 2000 you'll have to rewrite it using derived tables instead of CTEs.

Solution 3

I tried AlexKuznetsov's solution but the result was 49 :(

My solution:

/* Create temporary table and set all dates into 1 column,
so we can sort by this one column */
DECLARE @tmp table (
    Dates datetime,
    IsStartedDate bit )

INSERT INTO @tmp
    SELECT StartedOn, 1 FROM stats
    UNION ALL
    SELECT EndedOn, 0 FROM stats

DECLARE @currentlogins int, @highestlogins int, @IsStartedDate bit;
SET @currentlogins = 0;
SET @highestlogins = 0;

DECLARE tmp_cursor CURSOR FOR 
SELECT IsStartedDate FROM @tmp
ORDER BY Dates ASC

OPEN tmp_cursor

/* Step through every row, if it's a starteddate increment @currentlogins else decrement it
When @currentlogins is higher than @highestlogins set @highestlogins to the new highest value */
FETCH NEXT FROM tmp_cursor 
INTO @IsStartedDate

WHILE @@FETCH_STATUS = 0
BEGIN
    IF (@IsStartedDate = 1)
    BEGIN
        SET @currentlogins = @currentlogins + 1;
        IF (@currentlogins > @highestlogins)
            SET @highestlogins = @currentlogins;
    END
    ELSE
        SET @currentlogins = @currentlogins - 1;

    FETCH NEXT FROM tmp_cursor 
    INTO @IsStartedDate
END

CLOSE tmp_cursor
DEALLOCATE tmp_cursor

SELECT @highestlogins AS HighestLogins
Share:
10,469

Related videos on Youtube

Anne
Author by

Anne

Updated on February 11, 2020

Comments

  • Anne
    Anne over 4 years

    I have the table of following structure:

    UserID   StartedOn          EndedOn
    1        2009-7-12T14:01    2009-7-12T15:01 
    2        2009-7-12T14:30    2009-7-12T14:45
    3        2009-7-12T14:47    2009-7-12T15:30
    4        2009-7-12T13:01    2009-7-12T17:01
    5        2009-7-12T14:15    2009-7-12T18:01
    6        2009-7-12T11:01    2009-7-12T19:01
    1        2009-7-12T16:07    2009-7-12T19:01
    

    I need to find the maximal number of concurrent users that were on line. In the above table the result would be 5 because users set1={1,2,4,5,6} and set2={1,3,4,5,6} were online in the same period.

    Do you have an idea how one could calculate this using T-SQL only?

  • Remus Rusanu
    Remus Rusanu almost 15 years
    -1 This does not count the number of simulateously concurrent users, but the number of users any user is concurrent with. Eg. an user that holds a long session may see 10 different users come and by, but the max number of concurrent users was only 2. Your query will return 11 none the less.
  • Ben Griswold
    Ben Griswold almost 15 years
    I concur with Remus Rusanu's assessment. I didn't see this before posting, but he's absolutely correct. My solution will not work per his comments. Nice catch. +1 on your comment.
  • A-K
    A-K almost 15 years
    @ZippyV, yes, I fixed my query as soon as I had test data. It returns 5 now.
  • chendral
    chendral almost 15 years
    MAX() on a derived table, DISTINCT on a sub-query, GROUP BY. No offense but isn't that exactly what stackoverflow.com/questions/621884/… is about?
  • A-K
    A-K almost 15 years
    Using GROUP BY is what - a common mistake? Are you serious?
  • A-K
    A-K almost 15 years
    Do you have any benchmarks to prove it? How do you know that it is less efficient that other alternatives?
  • Jeff Moden
    Jeff Moden over 11 years
    It's easy to prove that this solution is grossly inefficient, Alex. Just look at the Actual Execution plan arrows. See the one with the "49" count? That comes from 7*7 (7 is the number of rows in the table). That means that without the correct indexing, you have a full blown Cartesian (Square) JOIN that will take a month of Sundays to run on any substantial quantity of rows. With the correct index, you have a wad of smaller but nearly as inefficient Triangular JOINs that will also take quite some time to execute. I'll most some test data in a minute that you can test against.
  • Jeff Moden
    Jeff Moden over 11 years
    I've added the code to build a more substantial test in one of the solutions boxes below. Look for the one that starts with "This is NOT a solution."
  • Jeff Moden
    Jeff Moden over 11 years
    Although I certainly agree with the premise, it would have been nice to see some code. ;-)
  • Jeff Moden
    Jeff Moden over 11 years
    Alex, If you look at the row where UserID = 6, you find the times there encompass all the other times. The correct answer for the rows currently presented in the original post is "7", not "5".
  • Jeff Moden
    Jeff Moden over 11 years
    Remus, this is definitely going in the right direction. The only problem is that you have a nasty Triangular JOIN that prevents this from scaling.
  • Jeff Moden
    Jeff Moden over 11 years
    Agh... My apologies. I misread the data. The max will be "5", not "7".