What data is scanned, when indexes are used when going through the edges?

Hey,
I understand we can create classes and classes have their own clusters (files).

I would like to understand what actually is read in the case where I go from one class to another using an edge. Given this as imaginary example:

Classes:

  • Group (name)
  • Company (name)
  • Employee (name)
  • Room (location)

These classes have only one supertype V.

Edges:

  • contains
  • employs
  • occupies

Group ---- contains -----> Company — employs ----> Employee ----- occupies ----> Room

Given this query:

SELECT
  name AS employee_name,
  out('occupies').location AS rooms
FROM `Employee`
WHERE in('employs').in('contains').name = 'my holding';

The main questions

  1. Is going through the edge employs going to scan the whole data set (all clusters) of Company class for each single found Employee vertex?
  2. In WHERE in('employs').in('contains') - will the edge contains scan entire Group class for each found vertex through employs edge?
  3. If class Group would have index on name, would it be used with in('contains').name = 'my holding'? Asking as I have an index in that case but EXPLAIN PLAN doesn’t mention it’s usage.
  4. What I can do to have best performance when going through a few edges in the WHERE clause to filter out something?

Thank you!

Hi @lgr

As a high level concept, edge traversal only involves two (plus one) data structures:

  • vertex records
  • edge records

When a vertex has less than (by default) 40 edges of one type, the pointers to the edges are stored inside the vertex itself, so nothing else is involved in the traversal. When you exceed this threshold, the edge pointers are moved in an external structure (sbtree bonsai), than you can consider as another record.
So, at most, when you traverse a single edge you access three data structures with constant time access.

Now, going deeper on the specific query and questions, the first thing I can say is that your WHERE condition is probably a bit inefficient.

That query will be executed as follows:

  • scan all the Emploee class
  • for each record, load all the connected Companies traversing Employs relationship
  • for each company found, traverse the Contains relationship, get the group
  • evaluate the name

Even if you have an index on Group.name, it won’t be used

Good news is that you can rewrite this query in at least two different ways, to make it much more efficient, eg.

SELECT 
  name AS employee_name,
  out('occupies').location AS rooms
FROM (
  SELECT expand(out('contains').out('employs'))
  FROM Group WHERE name = 'my holding'
)

This query will use the index and will only traverse the needed edges

An alternative is to use a MATCH statement

MATCH 
  {class:Group, where:(name = 'my holding')} -contains-> {} -employs-> {as:emp}
RETURN emp.name as employee_name, emp.out('occupies').location AS rooms

I hope it helps

Thanks

Luigi

1 Like

Thank you very much @luigidellaquila !

I see you are using expand, but what if I multiple WHERE conditions, for example like this (not the best one but just as an example):

SELECT
  name AS employee_name,
  out('occupies').location AS rooms
FROM `Employee`
WHERE in('employs').name = 'my company' AND in('employs').in('contains').name = 'my holding'

What will be the correct and efficient way to write it?

Thank you in advance,
Leszek

For queries that have conditions on multiple parts of the pattern, the best approach is to use a MATCH:

MATCH 
  {class:Group, where:(name = 'my holding')} -contains-> {where:(name = 'my company')} -employs-> {as:emp}
RETURN emp.name as employee_name, emp.out('occupies').location AS rooms

You can also use square bracket filtering, but it’s typically less flexible:

SELECT 
  name AS employee_name,
  out('occupies').location AS rooms
FROM (
  SELECT expand(out('contains')[name = 'my company'].out('employs'))
  FROM Group WHERE name = 'my holding'
)

Thanks

Luigi

1 Like

Even better, if you define the class type on the Company element, an index could be used:

MATCH 
  {class:Group, where:(name = 'my holding')} -contains-> {class:Company, where:(name = 'my company')} -employs-> {as:emp}
RETURN emp.name as employee_name, emp.out('occupies').location AS rooms

1 Like

One more thing if I may.

It happens I have a class named Vertex. When I try to use this class with MATCH I get errors like:

  • MATCH {class:Vertex, => Error parsing query: MATCH {class:Vertex
  • with backticks:
MATCH {class:`Vertex`

=> class not defined: Vertex

  • with a double or single quote
MATCH {class:'Vertex'

=> class not defined: 'Vertex'

but the class is there.

SELECT FROM `Vertex`

works.

I am on ODB 3.1.5. Any clue how to escape it?

Another error I face with MATCH queries is with:

MATCH {class:Project, as:project, where:(uuid = 'my-project-ud')}.inE()[relation='participates_in_project'].inV(){as: process}
RETURN project, process

This query fails due to usage of property on the edge [relation='participates_in_project']. If I remove it, the query works.
The error is:

Error parsing query:
MATCH {class:Project, as:project, where:(uuid = 'my-project-ud')}.inE()[relation='participates_in_project'].inV(){as: process} RETURN project, process
                                                                       ^
Encountered " "[" "[ "" at line 1, column 72.
Was expecting one of:
    <RETURN> ...
    "{" ...
    "," ...
    "." ...
    "<" ...
    "--" ...
    "-" ...
    
	DB name="my-db"
	Error Code="1"

For the second one, I’m afraid it’s not supported, but you can easily rewrite it as follows:

MATCH {class:Project, as:project, where:(uuid = 'my-project-ud')}.inE(){where:(relation='participates_in_project')}.outV(){as: process}
RETURN project, process

(please notice that I replaced inV() with outV(), I guess that’s what you meant with your query…)

For the “Vertex” class, that’s strange, let me test it and I’ll give you a feedback

Thanks

Luigi

1 Like

Not sure how it was 3 years ago when my class was defined, but now Vertex is a reserved keyword: SQL Syntax · OrientDB Manual

OK, I managed to reproduce and fix the problem.
The fix will be released with v 3.1.12

Thanks

Luigi

1 Like

Thank you @luigidellaquila
Just to let you know, the same problem occurs in case there is an edge class named Edge.

Yes, the problem is general with back-tick quoted class names, so the fix also covers the Edge case (and any other reserved word)

Thanks

Luigi