How to build a knowledge graph?

11,459

Solution 1

Knowledge graph is a buzzword. It is a sum of models and technologies put together to achieve a result. The first stop on your journey starts with Natural language processing, Ontologies and Text mining. It is a wide field of artificial intelligence, go here for a research survey on the field.

Before building your own models, I suggest you try different standard algorithms using dedicated toolboxes such as gensim. You will learn about tf-idf, LDA, document feature vectors, etc.

I am assuming you want to work with text data, if you want to do image search using other images it is different. Same for the audio part.

Building models is only the first step, the most difficult part of Google's knowledge graph is to actually scale to billions of requests each day ...

A good processing pipeline can be built "easily" on top of Apache Spark, "the current-gen Hadoop". It provides a resilient distributed datastore which is mandatory if you want to scale.

If you want to keep your data as a graph, as in graph theory (like pagerank), for live querying, I suggest you use Bulbs which is a framework which is "Like an ORM for graphs, but instead of SQL, you use the graph-traversal language Gremlin to query the database". You can switch the backend from Neo4j to OpenRDF (useful if you do ontologies) for instance.

For graph analytics you can use Spark, GraphX module or GraphLab.

Hope it helps.

Solution 2

I know I'm really late but first to clarify some terminology: Knowledge Graph and Ontology are similar (I'm talking in the Semantic Web paradigm). In the semantic web stack the foundation is RDF which is a language for defining graphs as triples (Subject, Predicate, Object). RDFS is a layer on top of RDF. It defines a meta-model, e.g., predicates such as rdf:type and nodes such as rdfs:Class. Although RDFS provides a meta-model there is no logical foundation for it so there are no reasoners that can validate the model or do further reasoning on it. The layer on top of RDFS is OWL (Web Ontology Language). That has a formal semantics defined by Description Logic which is a decidable subset of First Order Logic. It has more predefined nodes and links such as owl:Class, owl:ObjectProperty, etc. So when people use the term ontology they typically mean an OWL model. When they use the term Knowledge Graph it may refer to an ontology defined in OWL (because OWL is still ultimately an RDF graph) or it may mean just a graph in RDF/RDFS.

I said that because IMO the best way to build a knowledge graph is to define an ontology and then use various semantic web tools to load data (e.g., from spreadsheets) into the ontology. The best tool to start with IMO is the Protege ontology editor from Stanford. It's free and for a free open source tool very reliable and intuitive. And there is a good tutorial for how to use Protege and learn OWL as well as other Semantic Web tools such as SPARQL and SHACL. That tutorial can be found here: New Protege Pizza Tutorial (disclosure: that links to my site, I wrote the tutorial). If you want to get into the lower levels of the graph you probably want to check out a triplestore. It is a graph database designed for OWL and RDF models. The free version of Franz Inc's AllegroGraph triplestore is easy to use and supports 5M triples. Another good triplestore that is free and open source is part of the Apache Jena framework.

Share:
11,459
Pippi
Author by

Pippi

Updated on June 15, 2022

Comments

  • Pippi
    Pippi almost 2 years

    I prototyped a tiny search engine with pagerank that worked on my computer. I am interested in building a knowledge graph on top of it, and it should return only queried webpages that are within the right context, similarly to how Google found relevant answers to search questions. I saw a lot of publicity around knowledge graph but not a lot of literature and almost no pseudocode like guideline of building one. Does anyone know good references on how such knowledge graph works internally, so there will be no need to create models about a knowledge graph?