numpy/scipy build adjacency matrix from weighted edgelist

12,804

Solution 1

You could use a memory-efficient scipy.sparse matrix:

import numpy as np
import scipy.sparse as sparse

arr = np.array([[0, 1, 1],
                [0, 2, 1],
                [1, 2, 1],
                [1, 0, 1],
                [2, 1, 4]])
shape = tuple(arr.max(axis=0)[:2]+1)
coo = sparse.coo_matrix((arr[:, 2], (arr[:, 0], arr[:, 1])), shape=shape,
                        dtype=arr.dtype)

print(repr(coo))
# <3x3 sparse matrix of type '<type 'numpy.int64'>'
#   with 5 stored elements in COOrdinate format>

To convert the sparse matrix to a dense numpy array, you could use todense:

print(coo.todense())
# [[0 1 1]
#  [1 0 1]
#  [0 4 0]]

Solution 2

Try something like the following:

import numpy as np
import scipy.sparse as sps

A = np.array([[0, 1, 1],[0, 2, 1],[1, 2, 1],[1, 0, 1],[2, 1, 4]])
i, j, weight = A[:,0], A[:,1], A[:,2]
# find the dimension of the square matrix
dim =  max(len(set(i)), len(set(j)))

B = sps.lil_matrix((dim, dim))
for i,j,w in zip(i,j,weight):
    B[i,j] = w

print B.todense()
>>>
[[ 0.  1.  1.]
 [ 1.  0.  1.]
 [ 0.  4.  0.]]
Share:
12,804
Fabio Lamanna
Author by

Fabio Lamanna

I am a freelance civil engineer working on transportation networks, urban mobility and data analysis. I work with data coming from urban mobility, cities, social networks and transportation systems. I play with data organizing the DataBeers and PyData event in Venezia.

Updated on June 15, 2022

Comments

  • Fabio Lamanna
    Fabio Lamanna almost 2 years

    I'm reading a weighted egdelist / numpy array like:

    0 1 1
    0 2 1
    1 2 1
    1 0 1
    2 1 4
    

    where the columns are 'User1','User2','Weight'. I'd like to perform a DFS algorithm with scipy.sparse.csgraph.depth_first_tree, which requires a N x N matrix as input. How can I convert the previous list into a square matrix as:

    0 1 1
    1 0 1
    0 4 0
    

    within numpy or scipy?

    Thanks for your help.

    EDIT:

    I've been working with a huge (150 million nodes) network, so I'm looking for a memory efficient way to do that.

  • Fabio Lamanna
    Fabio Lamanna about 9 years
    Thanks for the tip! I thought there was some built-in function inside numpy and/or scipy to do that...
  • igavriil
    igavriil about 9 years
    I think that there isn't one. You may want to consider using networkx for creating and manipulating graphs.
  • Fabio Lamanna
    Fabio Lamanna about 9 years
    I tried it, but the problem is that the network is huge: around 150 million nodes. NetworkX crashes because of its dimensions. Your script is good, but it gets a MemoryError when it builds the B matrix. :-(
  • igavriil
    igavriil about 9 years
    What if you replace B with a sparse matrix?
  • Fabio Lamanna
    Fabio Lamanna about 9 years
    Do you mean inizializing a sparse B matrix? The code now crashes here: B = np.zeros((dim, dim)). But I'm running the script on a machine with 200 GB of RAM, I thought that a np.array was less memory addicted...
  • igavriil
    igavriil about 9 years
    Something like this import scipy.sparse as sps ... B = sps.lil_matrix((dim, dim))
  • igavriil
    igavriil about 9 years
  • Fabio Lamanna
    Fabio Lamanna about 9 years
    Thanks! Everything went smooth until the todense(): it makes the script run out of memory. But your solution is very fast and with low memory consumption!