How to Know What You Know: 5-Minute Interview

"I want to know what I know. That describes what knowledge graphs do for companies," said Dr. Alessandro Negro, Chief Scientist at GraphAware.

In this week’s five-minute interview, we discuss how GraphAware uses natural language processing to help companies gain a better understanding of the knowledge that is spread across their organization.

Original Link

Graph Algorithms in Neo4j: The Power of Graph Analytics

According to Gartner, "graph analysis is possibly the single most effective competitive differentiator for organizations pursuing data-driven operations and decisions."

Why did Gartner say this? Because graphs are the best structure for today’s complex and ever-changing data, and if you can analyze them at scale and uncover key patterns and trends, you will uncover numerous opportunities that others will miss.

Original Link

Graph Algorithms in Neo4j: Connected Data and Graph Analysis

Until recently, adopting graph analytics required significant expertise and determination since tools and integrations were difficult and few knew how to apply graph algorithms to their quandaries and business challenges. It is our goal to help change this.

We are writing this series to help organizations better leverage graph analytics so they make new discoveries and develop intelligent solutions faster.

Original Link

Effective Internal Risk Models for FRTB Compliance: Risk Management [Infographic]

In this series on the FRTB, we delved into what is required for effective internal risk models using a graph database like Neo4j. In previous weeks, we looked at the requirements of FRTB compliance and the relationship between risk modeling and data lineage.

Last week, we explained why modern graph technology is an effective foundation for compliance applications.

Original Link

Effective Internal Risk Models for FRTB Compliance: Modern Graph Technology

Relational database technology can’t handle what is coming in banking and risk modeling. By the 2020s, Accenture predicts current banking business models will be swept away by a tide of ever-evolving technology and other rapidly occurring changes.

The right foundation for building compliance solutions is graph database technology. Neo4j answers the demands of Fundamental Review of the Trading Book (FRTB) regulations while building a foundation for future investment and risk compliance applications. Neo4j is the world’s leading graph database platform and the ideal solution for tracking investment data lineage.

Original Link

Half-Terabyte Benchmark Neo4j vs. TigerGraph

Graph database having been becoming more and more popular and are getting lots of attention.

In order to know how graph databases perform, I researched the state-of-the-art benchmarks and found that loading speed, loaded data storage, query performance, and scalability are the common benchmark features. However, those benchmarks’ testing datasets are too small, ranging from 4MB to 30 GB. So, I decided to do my own benchmark. Let’s play with a huge dataset: half-terabytes.

Original Link

Graphs in RavenDB: Graph Modeling vs. Document Modeling

One of the most important design decisions we made with RavenDB is not forcing users to explicitly create edges between documents. Instead, the edges are actually just normal properties on the documents and can be used as-is. This means that pretty much any existing RavenDB database can immediately start using graph operations, and you don’t need to do anything.

The image below shows an order, using the RavenDB’s sample Northwind dataset. The highlighted portions mark the edges from this document. You can use these to traverse the graph by hopping from document to document.

Original Link

Graphs4Good: Connected Data for a Better World

You’re reading this because of a napkin.

It was the year 2000, and I was on a flight to Mumbai. Peter, Johan, and I had been building an enterprise content management system (ECM) but kept running up against the challenge of using an RDBMS for querying connected data.

Original Link

Graphs in RavenDB: The Overall Design

Note: This series of posts is about a planned feature and exploring how we go about building it. This is meant to solicit feedback and get more eyes on the idea. Things aren’t set in stone, and we don’t have a firm release date on this.

We have been wanting to add graph queries to RavenDB for several years now, but we always had more important things get in the way. That didn’t prevent us from discussing this internally and sketch up a few options. We are now looking at this more seriously and I thought that sharing the details of our deliberations would be interesting and likely to garner us some valuable feedback. I’m going to assume that the reader is at least somewhat familiar with the notion of graph data and graph queries.

Original Link

Fighting Money Laundering and Corruption With Graph Technology

The shocking revelations of the International Consortium of Investigative Journalists (ICIJ), who released both the Panama and Paradise Papers, as well as the West Africa Leaks, have shown that aggressive tax avoidance and money laundering are a widespread and worldwide problem.

Money laundering often correlates with other illegal activities such as terrorist financing and corruption in politics and businesses, while tax avoidance leads to political and social tensions.

Original Link

Effective Internal Risk Models for FRTB Compliance: The Importance of Risk Model Approval

Sweeping regulations are changing the way banks handle risk. The Fundamental Review of the Trading Book (FRTB) represents an important shift designed to provide a firm foundation for the future. While laws passed after the financial crisis offered a patchwork, the FRTB is a change that offers banks a motivation for putting in place a strong infrastructure for the future.

In this series on the FRTB, we explore what it takes to create effective internal risk models using a graph database like Neo4j. This week, we’ll look at the major areas impacted by the FRTB, including raising risk reserves, the trading desk, and the role and approval of internal risk models.

Original Link

Building a Dating Site With Neo4j: Part 12

It’s time to add "visions of love" to our dating site. So far, our posts have been just text status updates and while it is possible to fall in love with someone’s words, it’s harder if they look like the troll that lives under the bridge. So what’s the plan here? Well… like most databases out there, it’s not a good idea to store images in Neo4j. What we are going to store instead is a link to where the image resides, but we also don’t want to deal with having images all over our file system and then having to worry about storage space and replicating them, geographically distributing them for faster access, etc. Hosting images is a problem solved by the use of Content Delivery Networks. So let’s leverage one and build our feature.

There are a ton of CDNs out there, some are cheap, some are expensive, but we are going to go with the "ain’t got none of that sweet VC money" price point and use BunnyCDN. What I like about them, is that they are simple. Every time I see that AWS dashboard with a billion services and having to connect S3 to CloudFront to Route 53 feels like overkill.

Original Link

Building a Dating Site With Neo4j: Part 11

Up to this point, our users can send and receive messages, but we don’t have a way to show them all of their conversations, only one conversation at a time and they have to guess who messaged them before they can see those, which is not very useful. What we need is a directory of all the conversations our user is part of. Let’s go ahead and add this feature to tie things together.

In our Conversations class, we will add a new method "getConversations":

Original Link

Building a Dating Site With Neo4j: Part 10

To see Part 9, go here! I am now to the point where I want to do model messaging. There are a couple of ways of doing it. The first one is the simplest:

A user node has a MESSAGED relationship to another user node, the message and the time are stored as properties on the relationship and that’s it. It’s really easy to understand, but there is a problem with this model. As time grows and our user starts to have more conversations with various people, their node will be full of these MESSAGED relationships. How do we know which ones are new? We would have to traverse them all, get their "when" property, sort all the messages by time, and then show the user the most recent ones. This will make our query slower and slower as we add more data, and we want to avoid that. So what do we do? We could try "dated" relationship types:

Original Link

Building a Dating Site With Neo4j: Part 9

Now that our users can high five and low five each other, we want to show the other person those high fives and low fives. Well…do we really want to show the low fives? I’m not sure. A few years ago we talked about how to store the people who "swiped left" on a user (aka the "assholes" of Tinder). In this case, the user is not rejecting a person forever, they are just putting down one of their posts. If it’s two people who are competing for dates, then maybe the low five has a negative intent, but it would make the person who wrote the post feel they are doing something right. If the low five was from a potential mate, it could be a case of "negging" (which is stupid and you should never do that to people), it could be in jest if it was from someone they already had a conversation with, it could just have negative intent, or maybe a clumsy tap on the wrong button. We don’t really know.

How would you feel if someone high fived one of your posts, and low fived another? What about someone giving you a low five every day? I guess we have the blocking capability to deal with abusive behavior. Would people block anyone who gives them a low five? I don’t know…I can’t know until we have users on the site and analyze their behavior. Let’s opt to be permissive for now, except I don’t think we want to show fives from blocked users, but we will still count them. We start off our method with a few parameters:

Original Link

Building a Dating Site With Neo4j: Part 8

Up to this point, we have a timeline of posts from people we want to date, but no way to interact with those people. The first step begins today as we will allow users to high five and low five posts. Recall that once a user has high fived your post, you will be able to message them for up to 5 days when the high five expires. If you do not wish to message them, that’s fine, their high five gives you an additional high five to give to someone else in the hopes that they message you. Remember that all users get 5 "free" high fives a day. If they want more, they have to earn them. You can get a high five on a post that is older than 5 days, it still counts. This is needed to create the opportunity to bring back a user who hasn’t been to the dating site in a while with a high five to an old Post. Otherwise, after 5 days of inactivity, those users would be practically deleted.

Let’s start our method off. We’ll need the two users interacting as well as the post that is getting the high five. I tried using just the time of the post instead of leaking the post id, but theoretically, somebody could have two posts at the same time and I had to deal with converting from a ZonedDateTime to String and back, and decided it was easier to just use the node ID.

Original Link

Building a Dating Site With Neo4j: Part 6

Without posts, we can’t have High Fives, and that defeats the purpose of our dating site, so it’s time to let our users post things. We want to allow two types of posts: text posts and image posts. Today, we’re going to focus on text posts and getting them working, and we’ll deal with images in another post. The first thing we want to do is prevent users from posting bad things. So we’re going to create a PostValidator to deal with the user input:

 @POST public Response createPost(String body, @PathParam("username") final String username, @Context GraphDatabaseService db) throws IOException { Map<String, Object> results; HashMap<String, Object> input = PostValidator.validate(body);

In our validate method we will check for the usual things and then use Jsoup.clean to only allow simpleText for now. We are preventing XSS type attack and also don’t want to allow any outbound links. It’s not a publishing/advertising platform like Twitter, it’s meant to keep content within.

Original Link

Building a Dating Site With Neo4j: Part 5

Go here to view part 4 of this series. Have you ever eaten at a " Fusion Cuisine" type of restaurant? It’s a bit of a gamble. Personally, I’m always up for eating just about anything… except Pho. That stuff messes me up. But back to fusion cuisine. I think my favorite is Indian and Mexican. Take your favorite Indian dish, wrap that in the warm embrace that is a burrito tortilla, heaven. Well, just about anything wrapped in a burrito is perfect. Why am I talking about Fusion and Wrapping stuff? Well, today we are going to add AutoComplete into our Dating Site, but before we can do that, I need to talk to you about Neo4j’s Fusion Indexes and how they wrap the Lucene Indexes as well as our generation-aware B+tree (GB+Tree) indexes.

One of the tricky things about Neo4j is that we don’t enforce types. For any node (even those with the same label), your "phone number" property may be a 9 digit number, it may be a string, it may be an array of numbers, or strings, or it may be something represented by an array of bytes. Neo4j doesn’t care, it’s schema optional, with a heavy emphasis on the optional. To deal with this, we have the concept of a Fusion Index that merges different indexes of different value types into one.

Original Link

An Introduction to DBMS Types

This article will be of interest to those learning about databases who don’t have much prior knowledge of the different types or of the terminology involved. It provides a brief review and links to further information, which I hope is useful to anyone starting to work with databases. If you’re already a wizard with SQL or Neo4j, you won’t need to read any further!.

If you’re still with us, let’s first introduce some terminology. A database collects and organizes data, while access to that data is typically via a “database management system” (DBMS), which manages how the data is organized within the database. This article discusses some of the ways a DBMS may organize data. It reviews the difference between relational database management systems (RDBMS) and NoSQL.

Original Link

Building a Dating Site With Neo4j: Part 7

Now it is time to create the timeline for our users. Most of the time, the user wants to see posts from people they could High Five in order to elicit a conversation. Sometimes, they want to see what their competition is doing and what kind of posts are getting responses…also who they can low five. I don’t think they don’t want to see messages from people who are not like them and don’t want to date them but I could be wrong.

We need a bunch of parameters for our method. There are the obvious ones, but we’re also adding "city," "state," and "distance" so a user who is traveling can see potential dates from locations outside their typical place. Long distance relationships are hard, but short out of town dates are not. We are also including a "competition" flag to see those posts instead. We’ll make use of these later.

Original Link

Intro to Querying Neo4j Using OGM


Neo4j Object-Graph Mapping, or Neo4j OGM, is a library for modifying and querying Neo4j databases without directly using Cypher.

Conceptually similar to Java Persistence API for relational databases, OGM annotations are added to plain-old Java objects, identifying them as Neo4j nodes or relationships. New objects for nodes or relationships are created and added to the Neo4j session, which OGM persists by creating and then executing the appropriate Cypher statements.

Original Link

Building a Dating Site With Neo4j (Part 2)

We came up with an idea for a dating site and an initial model in Part One. Next, we are going to work on a back end HTTP API, because I’m old school and that’s the way I like it. We will build our HTTP API right into Neo4j using an extension which turns Neo4j from a Server into a Service. Unlike last time where we wrote a clone of Twitter, I don’t really know where I’m going with this, so let’s start with some of the obvious API endpoints and then we can design and build more as we go along. Is this Agile or am I just being an idiot? I can’t tell, so onward we go.

First obvious thing is, we need a schema. Luckily Neo4j is a “Schema Optional” database so we don’t have to worry about designing any tables or properties or figuring out what kind of properties each table will have. Because… well, we don’t have tables. The only real schema we need to worry about are Constraints and Indexes. For example, we don’t want two users to have the same username or same email, so we will create a uniqueness constraint on those Label-Property combinations. We also want our users to pick Attributes they have and Attributes they want in a potential mate. To keep things keep clean and help the matching, we will seed the database with some Attributes and not let the users create them dynamically. However they need to be able to find and search for these Attributes, so we will index their names. Well, we will index a lowercase version of their names since the current Neo4j schema indexes are CaSe SeNsItIve. So our schema endpoint could start like this:

Original Link

Building a Dating Site With Neo4j (Part 1)

You might have already heard that Facebook is getting into the Dating business. Other dating sites have been using graphs in the past and we’ve looked at finding love using the graph before. It has been a while though, so let’s return to the topic making use of the new Date and Geospatial capabilities of Neo4j 3.4. I have to warn you though that I’ve been with Helene for almost 15 years and missed out on all this dating site fun, what I do know I blame Colin for it and some pointers from the comments section of this blog post.

Dating sites face a series of challenges, the first one is lack of users. Only two ways to fix that, the first one involves having lots of money to pay for national advertisements, the second involves word of mouth. So you dear reader have to either invest a few million dollars or join our new dating site and tell all your friends about it.

Original Link

Building a Dating Site With Neo4j: Part Four

In the last post, we created a User model, built the login and registration pages, hooked everything up in our front end framework, Jooby, and got the ball rolling. I’m no designer, so I am borrowing an Application Bootstrap Theme and tweaking that as we go along (if you are a designer, pull requests are welcomed). At this stage, a ton of it is just mockup, but we will replace it with real functionality. This is what we have so far:

Five years ago, I wrote about Matchmaking with Neo4j in which our users had a list of things they wanted in a potential mate and a list of things they had to offer a potential mate. We are going to call these things Attributes and build them. If we let our users create them, they will make a mess of this list, so I think we’ll have to seed the database with some and maybe grow our list as users request more. Assuming we do that, let’s start with finding an Attribute in the Graph:

 public static Node findAttribute(String name, @Context GraphDatabaseService db) { if (name == null) { return null; } Node attribute = db.findNode(Labels.Attribute, NAME, name); if (attribute == null) { throw AttributeExceptions.attributeNotFound; } return attribute; }

Ok, easy enough. Now what about creating the HAS relationships. We need a POST method since we are creating something, we need the username adding the HAS relationship and the name of the Attribute being added. We also want to check that the user doesn’t already have this attribute, so we don’t create multiple relationships unnecessarily:

 @POST @Path("/{name}") public Response createHas(@PathParam("username") final String username, @PathParam("name") final String name, @Context GraphDatabaseService db) throws IOException { Map<String, Object> results; try (Transaction tx = db.beginTx()) { Node user = Users.findUser(username, db); Node attribute = Attributes.findAttribute(name, db); if (userHasAttribute(user, attribute)) { throw HasExceptions.alreadyHasAttribute; }

If all of that checks out, we create the HAS relationship and set a timestamp on it. For our result, we will return the properties of the Attribute plus some additional information. The user HAS this attribute, so we will set HAVE to true, we will check if they also WANT this attribute, and lastly, we will count how many incoming HAS and WANTS relationship the Attribute has to see how popular it is. One of the nice things about Neo4j is that we store the count of the relationships by direction and type for any node with over 40 relationships right in the first relationship record, making this a very cheap operation compared to other databases.

 Relationship has = user.createRelationshipTo(attribute, RelationshipTypes.HAS); has.setProperty(TIME,; results = attribute.getAllProperties(); results.put(HAVE, true); results.put(WANT, Wants.userWantsAttribute(user, attribute)); results.put(HAS, attribute.getDegree(RelationshipTypes.HAS, Direction.INCOMING)); results.put(WANTS, attribute.getDegree(RelationshipTypes.WANTS, Direction.INCOMING)); tx.success();

Next is returning those relationships. We need a GET method with the username of the person who HAS the relationships, a limit and offset in case they have a lot and we want to paginate through them, and the username of the person looking at this list. We want to be able to tell the person looking if they HAVE any of the attributes the first user WANTS and WANT any of the attributes the first user HAS.

 @GET public Response getHas(@PathParam("username") final String username, @QueryParam("limit") @DefaultValue("25") final Integer limit, @QueryParam("offset") @DefaultValue("0") final Integer offset, @QueryParam("username2") final String username2, @Context GraphDatabaseService db) throws IOException { ArrayList<Map<String, Object>> results = new ArrayList<>(); try (Transaction tx = db.beginTx()) { Node user = Users.findUser(username, db); Node user2; HashSet<Node> user2Has = new HashSet<>(); HashSet<Node> user2Wants = new HashSet<>(); if (username2 != null) { user2 = Users.findUser(username2, db); for (Relationship r1 : user2.getRelationships(Direction.OUTGOING, RelationshipTypes.HAS)) { user2Has.add(r1.getEndNode()); } for (Relationship r1 : user2.getRelationships(Direction.OUTGOING, RelationshipTypes.WANTS)) { user2Wants.add(r1.getEndNode()); } }

After we find the HAS and WANTS of the second user, we can check against the Attributes at the end of the HAS relationship for our first user. We want to once again get the degrees of the Attribute to see how popular it is. Lastly, we sort by date and return a subset based on our offset and limit.

 for (Relationship r1 : user.getRelationships(Direction.OUTGOING, RelationshipTypes.HAS)) { Node attribute = r1.getEndNode(); Map<String, Object> properties = attribute.getAllProperties(); ZonedDateTime time = (ZonedDateTime)r1.getProperty("time"); properties.put(TIME, time); properties.put(HAVE, user2Has.contains(attribute)); properties.put(WANT, user2Wants.contains(attribute)); properties.put(WANTS, attribute.getDegree(RelationshipTypes.WANTS, Direction.INCOMING)); properties.put(HAS, attribute.getDegree(RelationshipTypes.HAS, Direction.INCOMING)); results.add(properties); } tx.success(); } results.sort(sharedComparator.thenComparing(timedComparator)); if (offset > results.size()) { return Response.ok().entity(objectMapper.writeValueAsString( results.subList(0, 0))) .build(); } else { return Response.ok().entity(objectMapper.writeValueAsString( results.subList(offset, Math.min(results.size(), limit + offset)))) .build(); }

One thing when it comes to testing, since our custom ObjectMapper is returning dates in a specific format, we want to stick to that format when creating our test fixtures:

 "CREATE (fat:Attribute {name:'Fat'})" + "CREATE (bald:Attribute {name:'Bald'})" + "CREATE (rich:Attribute {name:'Rich'})" + "CREATE (jexp)-[:HAS {time: datetime('2018-07-19T17:12:56Z') }]->(fat)" + "CREATE (laeg)-[:WANTS {time: datetime('2018-07-19T17:38:57Z')}]->(bald)" + "CREATE (max)-[:HAS {time: datetime('2018-07-19T18:33:51Z') }]->(fat)" +

…and expected results.

 private static final ArrayList<HashMap<String, Object>> expected = new ArrayList<HashMap<String, Object>>() {{ add(new HashMap<String, Object>() {{ put("name", "Bald"); put("time", "2018-07-19T19:41:23Z"); put("has", 1); put("wants", 1); put("have", false); put("want", false); }});

I’ll spare you the code, but the WANTS relationship is just a mirror image of what we built just now. Let’s hook it up back to our application. First, we need a model for Attribute:

public class Attribute { private Long id; private String name; private String lowercase_name; private String time; private Integer wants; private Integer has; private Boolean want; private Boolean have;

But we also want a little helper method to display the time in a simpler form. We parse the time as a String and then convert it to what we want and how we want to display it.

 private static final DateTimeFormatter dateFormat = DateTimeFormatter.ofPattern("dd/MM/yyyy"); public String when() { ZonedDateTime dateTime = ZonedDateTime.parse(time); return dateFormat.format(dateTime); }

Next, we will connect our backend to our API:

 @GET("users/{username}/has") Call<List<Attribute>> getHas(@Path("username") String username, @Query("limit") Integer limit, @Query("offset") Integer offset, @Query("username2") String username2);

…and use it in our Application. What follows is a little convoluted because we want to show some of the application to users that are not logged in. This will allow users who are considering joining the Dating site, but aren’t sure, take a peek and then decide if they want to register. We figure out who is asking for this data first, then we check to see if the user requested is valid, get their has relationships by the API, and return them in a list.

 get("/user/{username}/has", req -> { String requested_by = req.get("requested_by"); if (requested_by.equals("anonymous")) requested_by = null; User authenticated = getUserProfile(requested_by); Response<User> userResponse = api.getProfile(req.param("username").value(), requested_by).execute(); if (userResponse.isSuccessful()) { User user = userResponse.body(); Integer limit = req.param("limit").intValue(25); Integer offset = req.param("offset").intValue(0); Response<List<Attribute>> attributesResponse = api.getHas(user.getUsername(), limit, offset, requested_by).execute(); List<Attribute> attributes = new ArrayList<>(); if (attributesResponse.isSuccessful()) { attributes = attributesResponse.body(); } return views.attributes.template(authenticated, user, attributes); } else { throw new Err(Status.BAD_REQUEST); } });

If it all goes well, then our Application will look like this:

The WANTS relationships are once again a mirror image, so we’ll skip it. Turns out the LIKES and HATES relationships follow the same pattern but with a Thing instead of an Attribute. If you fast forward a little bit, our back-end API now looks like:

:POST /v1/schema/create
:GET /v1/users/{username}
:POST /v1/users
:GET /v1/users/{username}/has
:POST /v1/users/{username}/has/{attribute}
:DELETE /v1/users/{username}/has/{attribute}
:GET /v1/users/{username}/wants
:POST /v1/users/{username}/wants/{attribute}
:DELETE /v1/users/{username}/wants/{attribute}
:GET /v1/users/{username}/likes
:POST /v1/users/{username}/likes/{thing}
:DELETE /v1/users/{username}/likes/{thing}
:GET /v1/users/{username}/hates
:POST /v1/users/{username}/hates/{thing}
:DELETE /v1/users/{username}/hates/{thing}

I don’t want you to be bored to death with every last detail, so we’ll skip the DELETEs and the very similar methods and move on to other parts of the dating site in the next article. For those who want the details, please take a look at the source.

Original Link

Neo4j Launches Commercial Kubernetes Application on Google Cloud Platform Marketplace

On behalf of the Neo4j team, I am happy to announce that today we are introducing the availability of the Neo4j Graph Platform within a commercial Kubernetes application to all users of the Google Cloud Platform Marketplace.

This new offering provides customers with the ability to easily deploy Neo4j’s native graph database capabilities for Kubernetes directly into their GKE-hosted Kubernetes cluster.

The Neo4j Kubernetes application will be “Bring Your Own License” (BYOL). If you have a valid Neo4j Enterprise Edition license (including startup program licenses), the Neo4j application will be available to you.

Commercial Kubernetes applications can be deployed on-premise or even on other public clouds through the Google Cloud Platform Marketplace.

What This Means for Kubernetes Users

We’ve seen the Kubernetes user base growing substantially, and this application makes it easy for that community to launch Neo4j and take advantage of graph technology alongside any other workload they may use with Kubernetes.

Kubernetes customers are already building some of these same applications, and using Neo4j on Kubernetes, a user combines the graph capabilities of Neo4j alongside an existing application, such as an application that is generating recommendations by looking at the behavior of similar buyers, or a 360-degree customer view that uses a knowledge graph to help spot trends and opportunities.

GCP Marketplace + Neo4j

GCP Marketplace is based on a multi-cloud and hybrid-first philosophy, focused on giving Google Cloud partners and enterprise customers flexibility without lock-in. It also helps customers innovate by easily adopting new technologies from ISV partners, such as commercial Kubernetes applications, and allows companies to oversee the full lifecycle of a solution, from discovery through management.

As the ecosystem leader in graph databases, Neo4j has supported containerization technology, including Docker, for years. With this announcement, Kubernetes customers can now easily pair Neo4j with existing applications already running on their Kubernetes cluster or install other Kubernetes marketplace applications alongside Neo4j.

Original Link

Neo4j Launches Commercial Kubernetes Application on Google Cloud Platform Marketplace

On behalf of the Neo4j team, I am happy to announce that today we are introducing the availability of the Neo4j Graph Platform within a commercial Kubernetes application to all users of the Google Cloud Platform Marketplace.

This new offering provides customers with the ability to easily deploy Neo4j’s native graph database capabilities for Kubernetes directly into their GKE-hosted Kubernetes cluster.

The Neo4j Kubernetes application will be “Bring Your Own License” (BYOL). If you have a valid Neo4j Enterprise Edition license (including startup program licenses), the Neo4j application will be available to you.

Commercial Kubernetes applications can be deployed on-premise or even on other public clouds through the Google Cloud Platform Marketplace.

What This Means for Kubernetes Users

We’ve seen the Kubernetes user base growing substantially, and this application makes it easy for that community to launch Neo4j and take advantage of graph technology alongside any other workload they may use with Kubernetes.

Kubernetes customers are already building some of these same applications, and using Neo4j on Kubernetes, a user combines the graph capabilities of Neo4j alongside an existing application, such as an application that is generating recommendations by looking at the behavior of similar buyers, or a 360-degree customer view that uses a knowledge graph to help spot trends and opportunities.

GCP Marketplace + Neo4j

GCP Marketplace is based on a multi-cloud and hybrid-first philosophy, focused on giving Google Cloud partners and enterprise customers flexibility without lock-in. It also helps customers innovate by easily adopting new technologies from ISV partners, such as commercial Kubernetes applications, and allows companies to oversee the full lifecycle of a solution, from discovery through management.

As the ecosystem leader in graph databases, Neo4j has supported containerization technology, including Docker, for years. With this announcement, Kubernetes customers can now easily pair Neo4j with existing applications already running on their Kubernetes cluster or install other Kubernetes marketplace applications alongside Neo4j.

Original Link

Introduction to Neo4j OGM


Neo4j Object-Graph Mapping, or Neo4j OGM, is a library for modifying and querying Neo4j databases without directly using Cypher.

Conceptually similar to Java Persistence API for relational databases, OGM annotations are added to plain-old Java objects, identifying them as Neo4j nodes or relationships. New objects for nodes or relationships are created and added to the Neo4j session, which OGM persists by creating and then executing the appropriate Cypher statements.

Using OGM to manipulate your Neo4j data gives you compile-time checks of nodes, relationships, labels, and properties that you don’t have when working directly with Cypher but still allows your data model to evolve naturally as other NoSQL databases.


  • Neo4j Server. Either Community or Enterprise, this intro tested with v3.3.1.
  • Neo4j OGM Libraries. Latest version today is v3.1.0, accessible via Maven, Gradle, and Ivy.

Sample Project

This sample project creates family members as Neo4j nodes and establishes marriage and parent-child between them. Objects are created and loaded into Neo4j via OGM without writing any Cypher.

Create Domain Objects


A Neo4j node, also known in graph theory as a vertex, is a data record containing a random set of properties. Each POJO class is a distinct node type within the Neo4j database. They are conceptually similar to a relational database table, except they are not predefined in Neo4j before being used.

Node entity objects have a class-level annotation identifying the class as a node and an annotated member indicating the internally-generated identifier. Other properties do not require annotations if OGM can derive them automatically.

public class Person { @Id @GeneratedValue private Long id; /** * person's year of birth */ private int birthYear; /** * Person name */ private String name; . . .


A Neo4j relationship, also known in graph theory as an arc or an edge, identifies a meaningful, directed relationship between two nodes in a graph. Neo4j OGM provides two techniques for creating relationships.

When relationship-specific properties are required to provide additional definition to the relationship, a Relationship entity object is created. Annotations define starting and ending nodes in the relationship; other properties do not require annotations if OGM can derive them automatically.

@RelationshipEntity(type = "MARRIED")
public class Married { /** * Internal Neo4J id of the node */ @Id @GeneratedValue private Long id; /** * If divorced, what year was the divorce finalized */ private Integer yearDivorced; /** * the year married */ private Integer yearMarried; /** * the wife in the marriage */ @StartNode private Person wife; /** * the husband in the marriage */ @EndNode private Person husband; . . .

Relationships can also be identified in the Node class if no relationship-specific properties are required using a collection of Nodes. Multiple relationships of different types can be defined this way.

public class Person { @Id @GeneratedValue private Long id; . . . @Relationship(type = "PARENT") private List<Person> children = null; . . .

Load Data

Configure Session

A session is configured by declaring how to connect and authenticate to your Neo4j database and identifying the packages containing the domain objects. Domain objects in other packages are not recognized by OGM and are not persisted to the database.

public class Loader { /** * Session factory for connecting to Neo4j database */ private final SessionFactory sessionFactory; // Configuration info for connecting to the Neo4J database static private final String SERVER_URI = "bolt://localhost"; static private final String SERVER_USERNAME = "neo4j"; static private final String SERVER_PASSWORD = "password"; /** * Constructor */ public Loader() { // Define session factory for connecting to Neo4j database Configuration configuration = new Configuration.Builder().uri(SERVER_URI).credentials(SERVER_USERNAME, SERVER_PASSWORD).build(); sessionFactory = new SessionFactory(configuration, "com.buddhadata.sandbox.neo4j.ogm.intro.node", "com.buddhadata.sandbox.neo4j.ogm.intro.relationship"); } . . .

Open Session and Transaction

Similar to JPA, you open a new session to Neo4j database and create a transaction in which your work exists.

Warning: For demo purposes, this demo project purges the database in each run; obviously you wouldn’t do this in a production environment!

public class Loader { . . . private void process () { // For demo purposes, create session and purge to cleanup whatever you have Session session = sessionFactory.openSession(); session.purgeDatabase(); // All work done in single transaction. Transaction txn = session.beginTransaction(); . . . }

Persist Data

Create the needed Node and Relationship objects and save them in the OGM session. Once all objects are passed to the session, commit the transaction.

public class Loader { . . . private void process () { . . . Person Carol = new Person ("Carol Maureen", 1945); Person Courtney = new Person ("Courtney Janice", 1945); Person Jeremy = new Person ("Jeremy Douglas", 1969); Person Mike = new Person ("Michael Blevins", 1945); Person Scott = new Person ("Scott Christoper", 1965); List<Person> children = Carol.getChildren(); children.add (Scott); children.add (Courtney); children.add (Jeremy); children = Mike.getChildren(); children.add (Scott); children.add (Courtney); children.add (Jeremy); (Carol); (Courtney); (Jeremy); (Mike); (Scott); (new Married(Carol, Mike, 1964, 1973)); txn.commit(); }

Check Work

In your browser, navigate to your Neo4j server and execute the following Cypher statement to return all nodes created.



Hopefully, you now understand the basic concepts of Neo4j OGM and can use it in your own use cases.

The complete demo project can be downloaded from here.

Original Link

The Neo4j JDBC Driver 3.3.1 Release Is Here [+ Examples]

Our team at LARUS has been quite busy since the last JDBC driver release. Today, we’re happy to announce the 3.3.1 release of the Neo4j-JDBC driver.

The release has been upgraded to work with recent Neo4j 3.3.x versions and Bolt driver 1.4.6. (Work on Neo4j 3.4.x and drivers 1.6.x is in progress.)

Neo4j-JDBC Driver Improvements and Upgrades

We worked on a number of improvements:

  • Added Bolt+routing protocol to let the driver work with the cluster and being able to route transactions to available cluster members.
  • Added support for in-memory databases for testing and embedded use cases.
  • Added a debug feature to better support the development phase or inspect how the driver works when used by third-party tools.
  • Added support for TrustStrategy so that you can now configure how the driver determines if it can trust the encryption certificates provided by the Neo4j instance it is connected to.
  • Implemented the DataSource interface so that you can now register the driver with a naming service based on the Java Naming and Directory Interface (JNDI) API and get a connection via JNDI lookups.
  • PLEASE NOTE: We’ve deprecated the usage of , as the parameter separator in favor of & to be compliant with the URL parameter syntax. Please update your connection URL because in future releases, we’ll manage just &. (In the future, we want to use , for parameters that can have a list of values).

Updated Documentation + Matlab Example

The documentation has been updated to explain how to use the new features and now includes a Matlab example.

Open connection:

conn = database('','neo4j','test','org.neo4j.jdbc.BoltNeo4jDriver', 'jdbc:neo4j:bolt://localhost:7687')

Fetch Total Node Count:

curs = exec(conn,'MATCH (n) RETURN count(*)')
curs = fetch(curs);
ans = '102671'

Besides Matlab, Neo4j-JDBC can, of course, be used with many other tools. Here is a short list:

  • Squirrel SQL
  • Eclipse / BIRT
  • Jasper Reports
  • RapidMiner Studio
  • Pentaho Kettle
  • Streamsets

API/Interface Work for JDBC Compatibility

We implemented the DataSource interface so that you can now register the driver with a naming service based on the Java Naming and Directory Interface (JNDI) API and get a connection via JNDI lookups. This should help a lot when you need a server-managed connection to Neo4j in a JEE environment.

We also added implementations for several methods in Driver, Connection, Statement, ResultSet that were not there previously.

This helps you use the Neo4j-JDBC driver with MyBatis and other frameworks, like Spring JDBC.

Introducing New Support for Causal Clustering

It’s not always easy to adapt the brand-new Neo4j features and protocols to an old-fashioned interface such as the Java Database Connectivity (JDBC). This is because the capabilities of Neo4j Clusters and the Neo4j Java Bolt driver are evolving very rapidly.

Our latest task at LARUS was to make the Neo4j-JDBC driver interact with a Neo4j Causal Cluster providing all the client-side clustering features supported by the Bolt driver:

  • The possibility to route reads and writes to the server with the correct role
  • Defining a routing context
  • Managing bookmarks for causal consistency
  • Supporting multiple bootstrap servers

We’re very happy to present what we’ve been able to achieve!

Bolt+Routing Protocol

If you’re connecting to a Neo4j Causal Cluster and you want to manage routing strategies, the JDBC URL must have this format:

jdbc:neo4j:bolt+routing://host1:port1,host2:port2,..., hostN:portN/?username=neo4j,password=xxxx

You might have noticed we introduced the new protocol jdbc:neo4j:bolt+routing, which indeed allows you to create a routing driver.

The list of [host:port] pairs in the URL corresponds to the list of servers that are participating as Core instances in the Neo4j Cluster. If you or your preferred tool doesn’t support this format you can fall back to the dedicated parameter routing:servers, as in the following example:

jdbc:neo4j:bolt+routing://host1:port1?username=neo4j,password=xxxx, routing:region=EU&country=Italy&routing:servers=host2:port2;...;hostN:portN

In that case, the address in the URL must be that of a Core server and the alternative servers must be; separated (instead of ,).

Routing Context

Routing driver with routing context is an available option with a Neo4j Causal Cluster of version 3.2 or above. In such a setup, you can include a preferred routing context via the routing:policy parameter.

jdbc:neo4j:bolt+routing://host1:port1,host2:port2,..., hostN:portN?username=neo4j,password=xxxx,routing:policy=EU

While for custom routing strategies you can use the generic routing: parameter:

jdbc:neo4j:bolt+routing://host1:port1,host2:port2,..., hostN:portN?username=neo4j,password=xxxx,routing:region=EU&country=Italy

Access Mode (READ, WRITE)

Transactions can be executed in either read or write mode (see access mode), which is a really useful feature to support in JDBC too. The user can start a transaction in read or write mode via the Connection#setReadOnly method.

Note: Beware not to invoke that method while a transaction is currently open. If you do, the driver will raise an SQLException.

By using this method, when accessing the Neo4j Causal Cluster, write operations will be forwarded to Core instances while read operations will be managed by all cluster instances (depending on routing configuration).

You can find an example after the next paragraph.


When working with a Causal Cluster, causal chaining is carried out by passing bookmarks between transactions in a session (see “causal chaining” in the Neo4j docs).

The JDBC driver allows you to read bookmarks by calling the following method:


Of course, you can set the bookmark by calling the corresponding method:

connection.setClientInfo(BoltRoutingNeo4jDriver.BOOKMARK, "my bookmark");

Bolt+Routing With Bookmark Example

String connectionUrl = "jdbc:neo4j:bolt+routing://localhost:17681,localhost:17682, localhost:17683,localhost:17684,localhost:17685,localhost:17686, localhost:17687?noSsl&routing:policy=EU"; try (Connection connection = DriverManager.getConnection(connectionUrl, "neo4j", password)) { connection.setAutoCommit(false); // Access to CORE instances, as the connection is opened by // default in write mode (connection.setReadOnly(false)) try (Statement statement = connection.createStatement()) { statement.execute("CREATE (:Person {name: 'Anna'})"); } // closing transaction before changing access mode connection.commit(); // printing the transaction bookmark String bookmark = connection.getClientInfo( BoltRoutingNeo4jDriver.BOOKMARK); System.out.println(bookmark); // Switching to read-only mode to access all cluster instances connection.setReadOnly(true); try (Statement statement = connection.createStatement()) { try (ResultSet resultSet = statement.executeQuery( "MATCH (p:Person {name:'Anna'}) RETURN count(*) AS total")) { if ( { Long total = resultSet.getLong("total"); assertEquals(1, total); } } } connection.commit();

Thanks to the bookmark, we expect that the total number of Person nodes returned is 1 (given an empty database), even if we are switching from a Core node – where we perform the CREATE operation — to some instance in the cluster, where instead we’ve performed the MATCH operation.


We really hope you enjoyed our work, and we’d love to hear from you, not just about issues, but also how you use the JDBC driver in your projects or which tools you use that we haven’t mentioned.

If you want to use the Neo4j-JDBC driver in your application, you can depend on org.neo4j:neo4j-jdcb:3.3.1 in your build setup, while for use with standalone tools it’s best to grab the release from GitHub.

Original Link

Neo4j 3.4 Release Highlights in Less Than 8 Minutes [Video]

Hi everyone,

My name is Ryan Boyd, and I’m on the Developer Relations team here at Neo4j. I want to talk to you today about our latest release, Neo4j 3.4.


In Neo4j 3.4, we’ve made improvements to the entire graph database system, from scalability and performance to operations, administration, and security. We’ve also added several new key features to the Cypher query language, including spatial querying support and date/time types.


Let’s talk about the scalability features in Neo4j 3.4.

In this release, we’ve added Multi-Clustering support. This allows your global Internet apps to horizontally partition their graphs by domain, such as country, product, customer or data center.

Now, why might you want to do this? You might want to use this new feature if you have a multi-tenant application that wants to store each customer’s data separately. You might also want to use this because you want to geopartition your data for certain regulatory requirements or if you want enhanced write scaling.

Look at the four clusters shown in the image above. Each of these clusters has a different graph, but they are managed together. They can also be used by a single application with Bolt routing the right data to the right cluster, and the data is kept completely separate.

Read Performance

As with all releases, in Neo4j 3.4 we made a number of improvements to read performance.

If you look at a read benchmark in a mixed workload environment, you can see that from Neo4j 3.2 to 3.3 we improved performance by 10%.

Now, for this release, we spent the last several release cycles working on an entirely new runtime for Neo4j Enterprise Edition. I’m proud to say that in Neo4j 3.4, we’ve made all queries use this new Cypher runtime, and that improves performance by roughly 70% on average.

Write Performance

Write performance is also important.

In our ongoing quest to take writes to the next level, we’ve been hammering away at one component that incurs roughly 80% of all overhead when writing to a graph. Now, what component it is may not be so obvious — it’s indexes.

Lucene is fantastic at certain things. It’s awesome at full text, for instance, but it turns out to be not so good for ACID writes with individually indexed fields. So, we’ve moved from using Lucene as our index provider to using our native Neo4j index.

We’ve actually moved to a native index for our label groupings in 3.2, for numerics in 3.3, and now, with the string support in 3.4, we’ve added a lot of the common property types to the new native index. This is what results in our significantly faster performance on writes.

Our native index is optimized for graphs. Its ACID-compliance allows you fast reads, and as you can see, approximately 10 times faster writes. The image below shows you the write performance for the first 3.4 release candidate when writing strings.

At the point at which we implemented the new native string index, we have approximately a 500% improvement in the overall write performance.

Ops and Admin

We’ve also made a number of improvements around operations and administration of Neo4j in the 3.4 release. Perhaps the most important is rolling upgrades.

Neo4j powers many mission-critical applications, and something many customers have told us is that they want the ability to upgrade their cluster without any planned downtime. This feature enables just that. So if you’re moving from Neo4j 3.4 to the next release, you could do it by upgrading each member in the cluster separately in a rolling fashion.

Neo4j 3.4 also adds auto cache reheating. So, let’s say that you normally heat up your cache when your Neo4j server starts. When you restart your server the next time, we’ll automatically handle the reheating of your cache for you.

The performance of backups is also important to many of our customers, and they are now two times faster.

Spatial & Date/Time Data

With Neo4j 3.4, we’ve now added the power of searching by spatial queries. Our geospatial graph queries allow you to search in a radius from a particular point and find all of the items that are located within that radius. This is indexed and highly performant.

In addition to supporting the standard X and Y dimensions, we’ve also added support so that you can run your queries in three dimensions. Now, how you might use this is totally up to you.

Think about a query like “Recommend a shirt available in a store close by in the men’s department.” You can take your location and find the different stores. And then, once you’re in a particular store, you can use that third dimension support — the Z axis — to find the particular floor and rack where that shirt is available.

In addition to the spatial type, we’ve also added support for date and time operations.

Database Security

We’ve also added a new security feature in this release that focuses on property-level security for keeping private data private.

Property-level security allows you to blacklist certain properties so that users with particular roles are unable to access those properties. In this case, users in Role X are unable to read property A , and users with Role Y are unable to read properties B and C.

Try It Out with the Neo4j Sandbox

For the GA release of Neo4j 3.4, we’ve created a special Neo4j Sandbox. The 3.4 sandbox has a guide that guides you through the new date/time type and spatial querying support.

Watch the video for a quick demo of the new Neo4j Sandbox, or try it out yourself by clicking below.

Try Out the Neo4j Sandbox

Original Link

Offers With Neo4j

Neo4j has many retailers as clients and one of their use cases is making offers to their customers. I was with a client today who had seen my boolean logic rules engine and decision tree blog posts and they were considering going that route for their offers but threw down the challenge of being able to do offers by just using Cypher. Their requirements were that offers can be of three types: “AllOf” offers require that the customer have all the requirements in order to be triggered, “AnyOf” offers, which required just one of the requirements to be met, and “Majority,” which required the majority of requirements to be met. The model could look like this:

Let’s go ahead and create some sample data:

CREATE (o:Offer { name: "Offer 1", type:"Majority", from_date: date({ year: 2018, month: 5, day: 1 }), to_date: date({ year: 2018, month: 5, day: 30 }) }),
(req1:Requirement {id:"Product 1"})<-[:REQUIRES]-(o),
(req2:Requirement {id:"Product 2"})<-[:REQUIRES]-(o),
(req3:Requirement {id:"New Customer"})<-[:REQUIRES]-(o),
(req4:Requirement {id:"In Illinois"})<-[:REQUIRES]-(o), (o2:Offer { name: "Offer 2", type:"AnyOf", from_date: date({ year: 2018, month: 5, day: 1 }), to_date: date({ year: 2018, month: 5, day: 30 })}),
(req5:Requirement {id:"Existing Customer"})<-[:REQUIRES]-(o2),
(req6:Requirement {id:"Last Purchase > 30 Days Ago"})<-[:REQUIRES]-(o2),
(req7:Requirement {id:"In California"})<-[:REQUIRES]-(o2), (o3:Offer { name: "Offer 3", type:"AllOf", from_date: date({ year: 2018, month: 5, day: 1 }), to_date: date({ year: 2018, month: 5, day: 30 })}),

It looks like this in the Neo4j browser:

Now we are ready to write our query. It needs to return offers that are valid today, and they need to be relevant to the customer so they need to have at least one requirement in common with the customer. We must return the offer, the requirements we meet, all of the offers requirements, the missing requirements, and whether or not we meet those requirements. That sounds pretty complicated, but let’s see the finished query and then we can walk through it in steps:

MATCH (req:Requirement)<-[:REQUIRES]-(o:Offer)
WHERE o.from_date < date() < o.to_date AND IN ["Product 1", "Product 2", "In Illinois", "Existing Customer"]
MATCH (o)-[:REQUIRES]->(reqs:Requirement)
WITH o, have, COLLECT( AS need
RETURN o, have, need, CASE o.type WHEN "AnyOf" THEN ANY(x IN need WHERE x IN have)
WHEN "AllOf" THEN ALL(x IN need WHERE x IN have)
WHEN "Majority" THEN SIZE(have) > SIZE(need)/2.0
END AS qualifies, FILTER(x IN need WHERE NOT x IN have) AS missing

Not bad right? If you have never used the Cypher CASE statement or FILTER statement, click on those links to learn more about them. So, what’s our query doing? The first thing we want to do is use the “date()” function from Neo4j 3.4. to get today’s date and compare it to the from_date and to_date of our offers. The offers need to have at least one requirement that the user has, so we MATCH and use an “IN” clause to find them and collect them into a list by the offer that we call “have.”

MATCH (req:Requirement)<-[:REQUIRES]-(o:Offer)
WHERE o.from_date < date() < o.to_date AND IN ["Product 1", "Product 2", "In Illinois", "Existing Customer"]

Next, we find all of the requirements for our offer and collect them in a list we call “need.”

MATCH (o)-[:REQUIRES]->(reqs:Requirement)
WITH o, have, COLLECT( AS need

Next, we return the Offer, the have and need lists, and we use a CASE statement to figure out if we meet the requirements of the offer. If the offer is of type “AnyOf,” we just need to make sure that any requirement that we have is in the requirements that we need. If the offer is of type “AllOf,” we need to make sure ALL the requirements are met. These ANY and ALL keywords are predicates in cypher that return TRUE or FALSE.

RETURN o, have, need, CASE o.type WHEN "AnyOf" THEN ANY(x IN need WHERE x IN have)
WHEN "AllOf" THEN ALL(x IN need WHERE x IN have)

If the offer is of type “Majority,” then we make sure the size of the have list is greater than half the size of the need list. Majority requires 50% + 1, if we wanted “at least 50%” we could make that a greater than or equal to comparison instead. Finally, we want to return the missing requirements as well. We use a FILTER to get the list of missing requirements by checking each requirement in need and seeing if they are missing in the list of have.

WHEN "Majority" THEN SIZE(have) > SIZE(need)/2.0
END AS qualifies, FILTER(x IN need WHERE NOT x IN have) AS missing

and there we have it:

So give it a shot, try changing the requirements passed in the array and see how the results change. Remember, you will need Neo4j 3.4.0 or higher because of the use of the new date datatype. So go get it.

Before we end this, there are other ways to write this query, for example, we could have written the case statement in this way:

RETURN o, have, need, CASE o.type WHEN "AnyOf" THEN true
WHEN "AllOf" THEN SIZE(have) = SIZE(need)

It works because “AnyOf” is always true since we wouldn’t have gotten to the offer if none of the requirements matched. Instead of using the ALL predicate, we could simply compare the sizes of the two lists for AllOf. You may have been tempted to write “have = need” but the order of the items in the lists are not guaranteed and out of order lists are not equal even if they contain the same values.

Original Link

Using NGINX to Proxy a Neo4j Instance [Snippets]

There are cases when you want to access your Neo4j instance remotely and you live in an environment where direct access is not possible. This might be caused by technical or organizational restrictions.

One generic solution to this kind of problem is using a VPN. Another alternative to be discussed in this blog post is using a reverse proxy server. I want to show how you can proxy Neo4j using NGINX.

First of all, run a neo4j instance. In order to not have false positive results, I’m using non-standard ports for HTTP (default 7474, using 17474 here) and bolt (default 7687, using 17687 here). Spinning up a test instance is easy in Docker:

docker run --rm -e NEO4J_AUTH=none -p 17474:7474 -p 17687:7687 neo4j 

Note that I’ve switched off authentication, something that might be ok for testing, but is a clear no-go for any other kind of usage.

I’m installing NGINX directly on my system:

apt install nginx

Then we need to map both communication channels: HTTP and bolt. For the HTTP part, we add the following inside the server section of /etc/nginx/sites-available/default this snippet:

location /browser/ {<br/> proxy_pass http://localhost:17474/; # <-- replace with your neo4j instance's http servername + port<br /> }

For the bolt protocol, we amend to /etc/nginx/nginx.conf:

stream { server {<br/>listen 7687;<br/> proxy_pass localhost:17687; # <--- replace this with your neo4j server and bolt port<br /> }<br />}

After a restart of NGINX, pointing your browser to http://localhost/browser should show the Neo4j browser.

Original Link

It’s Time for a Single Property Graph Query Language [Vote Now]

The time has come to create a single, unified property graph query language.

Different languages for different products help no one. We’ve heard from the graph community that a common query language would be powerful: more developers with transferable expertise, portable queries, solutions that leverage multiple graph options, and less vendor lock-in.

One language, one skill set.

The Property Graph Space Has Grown…A Lot

Property graph technology has a big presence from Neo4j and SAP HANA to Oracle PGX and Amazon Neptune. An international standard would accelerate the entire graph solution market, to the mutual benefit of all vendors and — more importantly — to all users.

That’s why we are proposing a unified graph query language, GQL (Graph Query Language), that fuses the best of three property graph languages.

Relational Data Has SQL, and Property Graphs Need GQL

Although SQL has been fundamental for relational data, we need a declarative query language for the powerful — and distinct — property graph data model to play a similar role.

Like SQL, the new GQL needs to be an industry standard. It should work with SQL but not be confined by SQL. The result would be better choices for developers, data engineers, data scientists, CIOs, and CDOs alike.

Right now, there are three property graph query languages that are closely related. We have Cypher (from Neo4j and the openCypher community), we have PGQL (from Oracle), and we have G-CORE, a research language proposal from the Linked Data Benchmark Council [LDBC] (co-authored by world-class researchers from the Netherlands, Germany, Chile, the U.S, and technical staff from SAP, Oracle, Capsenta, and Neo4j).

The proposed GQL (Graph Query Language) would combine the strengths of Cypher, PGQL, and G-CORE into one vendor-neutral and standardized query language for graph solutions, much like SQL is for RDBMS.

Each of these three query languages has similar data models, syntax, and semantics. Each has its merits and gaps, yet their authors share many ambitions for the next generation of graph queryings, such as a composable graph query language with graph construction, views, and named graphs; and a pattern-matching facility that extends to regular path queries.

Let Your Voice Be Heard on GQL

The Neo4j team is advocating that the database industry and our users collaborate to define and standardize one language.

Bringing PGQL, G-CORE, and Cypher together, we have a running start. Two of them are industrial languages with thousands of users, and combined with the enhancements of a research language, they all share a common heritage of ASCII art patterns to match, merge, and create graph models.

What matters most right now is a technically strong standard with strong backing among vendors and users. So we’re appealing for your vocal support.

Please vote now on whether we should unite to create a standard Graph Query Language (GQL), in the same manner as SQL.

Should the property graph community unite to create a standard Graph Query Language, GQL, alongside SQL?

For more information, you can read the GQL manifesto here and watch for ongoing updates.

Emil Eifrem, CEO;
Philip Rathle, VP of Products;
Alastair Green, Lead, Query Languages Standards & Research;
for the entire Neo4j team

Original Link

Graph Algorithms in Neo4j: 15 Different Graph Algorithms and What They Do

Graph analytics have value only if you have the skills to use them and if they can quickly provide the insights you need. Therefore, the best graph algorithms are easy to use, are fast to execute, and produce powerful results.

Neo4j includes a growing, open library of high-performance graph algorithms that reveal the hidden patterns and structures in your connected data.

In this series on graph algorithms, we’ll discuss the value of graph algorithms and what they can do for you. Previously, we explored how data connections drive future discoveries and how to streamline those data discoveries with graph analytics.

This week, we’ll take a detailed look at the many graph algorithms available in Neo4j and what they do.

Using Neo4j graph algorithms, you’ll have the means to understand, model, and predict complicated dynamics such as the flow of resources or information, the pathways that contagions or network failures spread, and the influences on and resiliency of groups.

And because Neo4j brings together analytics and transaction operations in a native graph platform, you’ll not only uncover the inner nature of real-world systems for new discoveries but also develop and deploy graph-based solutions faster and have easy-to-use, streamlined workflows. That’s the power of an optimized approach.

Here is a list of the many algorithms that Neo4j uses in its graph analytics platform, along with an explanation of what they do.

Traversal and Pathfinding Algorithms

1. Parallel Breadth-First Search (BFS)

What it does: Traverses a tree data structure by fanning out to explore the nearest neighbors and then their sub-level neighbors. It’s used to locate connections and is a precursor to many other graph algorithms.

BFS is preferred when the tree is less balanced or the target is closer to the starting point. It can also be used to find the shortest path between nodes or avoid the recursive processes of depth-first search.

How it’s used: Breadth-first search can be used to locate neighbor nodes in peer-to-peer networks like BitTorrent, GPS systems to pinpoint nearby locations, and social network services to find people within a specific distance.

2. Parallel Depth-First Search (DFS)

What it does: Traverses a tree data structure by exploring as far as possible down each branch before backtracking. It’s used on deeply hierarchical data and is a precursor to many other graph algorithms. Depth-first search is preferred when the tree is more balanced or the target is closer to an endpoint.

How it’s used: Depth-first search is often used in gaming simulations where each choice or action leads to another, expanding into a tree-shaped graph of possibilities. It will traverse the choice tree until it discovers an optimal solution path (i.e. win).

3. Single-Source Shortest Path

What it does: Calculates a path between a node and all other nodes whose summed value (weight of relationships such as cost, distance, time, or capacity) to all other nodes is minimal.

How it’s used: Single-source shortest path is often applied to automatically obtain directions between physical locations, such as driving directions via Google Maps. It’s also essential in logical routing, such as telephone call routing (least-cost routing).

4. All-Pairs Shortest Path

What it does: Calculates a shortest path forest (group) containing all shortest paths between the nodes in the graph. It’s commonly used for understanding alternate routing when the shortest route is blocked or becomes sub-optimal.

How it’s used: All-pairs shortest path is used to evaluate alternate routes for situations, such as a freeway backup or network capacity. It’s also key in logical routing to offer multiple paths; for example, call routing alternatives.

5. Minimum Weight Spanning Tree (MWST)

What it does: Calculates the paths along a connected tree structure with the smallest value (weight of the relationship such as cost, time, or capacity) associated with visiting all nodes in the tree. It’s also employed to approximate some NP-hard problems such as the traveling salesman problem and randomized or iterative rounding.

How it’s used: Minimum weight spanning tree is widely used for network designs: least-cost logical or physical routing such as laying cable, fastest garbage collection routes, capacity for water systems, efficient circuit designs, and much more. It also has real-time applications with rolling optimizations, such as processes in a chemical refinery or driving route corrections.

Centrality Algorithms

6. PageRank

What it does: Estimates a current node’s importance from its linked neighbors and then again from their neighbors. A node’s rank is derived from the number and quality of its transitive links to estimate influence. Although popularized by Google, it’s widely recognized as a way of detecting influential nodes in any network.

How it’s used: PageRank is used in quite a few ways to estimate importance and influence. It’s used to suggest Twitter accounts to follow and for general sentiment analysis.

PageRank is also used in machine learning to identify the most influential features for extraction. In biology, it’s been used to identify which species extinctions within a food web would lead to biggest chain reaction of species death.

7. Degree Centrality

What it does: Measures the number of relationships a node (or an entire graph) has. It’s broken into indegree (flowing in) and outdegree (flowing out) where relationships are directed.

How it’s used: Degree centrality looks at immediate connectedness for uses such as evaluating the near-term risk of a person catching a virus or hearing information. In social studies, the indegree of friendship can be used to estimate popularity and outdegree as gregariousness.

8. Closeness Centrality

What it does: Measures how central a node is to all its neighbors within its cluster. Nodes with the shortest paths to all other nodes are assumed to be able to reach the entire group the fastest.

How it’s used: Closeness centrality is applicable in a number of resources, communication, and behavioral analysis, especially when interaction speed is significant. It has been used for identifying the best location of new public services for maximum accessibility.

In social network analysis, it is used to find people with the ideal social network location for faster dissemination of information.

9. Betweenness Centrality

What it does: Measures the number of shortest paths (first found with breadth-first search) that pass through a node. Nodes that most frequently lie on shortest paths have higher betweenness centrality scores and are the bridges between different clusters. It is often associated with the control over the flow of resources and information.

How it’s used: Betweenness centrality applies to a wide range of problems in network science and is used to pinpoint bottlenecks or likely attack targets in communication and transportation networks. In genomics, it has been used to understand the control certain genes have in protein networks for improvements such as better drug/disease targeting.

Betweenness Centrality has also be used to evaluate information flows between multiplayer online gamers and expertise sharing communities of physicians.

Community Detection Algorithms

This category is also known as clustering algorithms or partitioning algorithms.

10. Label Propagation

What it does: Spreads labels based on neighborhood majorities as a means of inferring clusters. This extremely fast graph partitioning requires little prior information and is widely used in large-scale networks for community detection. It’s a key method for understanding the organization of a graph and is often a primary step in other analysis.

How it’s used: Label propagation has diverse applications, from understanding consensus formation in social communities to identifying sets of proteins that are involved together in a process (functional modules) for biochemical networks. It’s also used in semi- and unsupervised machine learning as an initial preprocessing step.

11. Strongly Connected

What It Does: Locates groups of nodes where each node is reachable from every other node in the same group following the direction of relationships. It’s often applied from a depth-first search.

How it’s used:Strongly connected is often used to enable running other algorithms independently on an identified cluster. As a preprocessing step for directed graphs, it helps quickly identify disconnected groups. In retail recommendations, it helps identify groups with strong affinities that then are used for suggesting commonly preferred items to those within that group who have not yet purchased the item.

12. Union-Find/Connected Components/Weakly Connected

What it does: Finds groups of nodes where each node is reachable from any other node in the same group, regardless of the direction of relationships. It provides near constant-time operations (independent of input size) to add new groups, merge existing groups, and determine whether two nodes are in the same group

How it’s used: Union-find/connected components is often used in conjunction with other algorithms, especially for high-performance grouping. As a preprocessing step for undirected graphs, it helps quickly identify disconnected groups.

13. Louvain Modularity

What it does: Measures the quality (i.e. presumed accuracy) of a community grouping by comparing its relationship density to a suitably defined random network. It’s often used to evaluate the organization of complex networks and community hierarchies in particular. It’s also useful for initial data preprocessing in unsupervised machine learning.

How it’s used: Louvain is used to evaluate social structures on Twitter, LinkedIn, and YouTube. It’s used in fraud analytics to evaluate whether a group has just a few bad behaviors or is acting as a fraud ring that would be indicated by a higher relationship density than average. Louvain revealed a six-level customer hierarchy in a Belgian telecom network.

14. Local Clustering Coefficient/Node Clustering Coefficient

What it does: For a particular node, it quantifies how close its neighbors are to being a clique (every node is directly connected to every other node). For example, if all your friends knew each other directly, your local clustering coefficient would be 1. Small values for a cluster would indicate that although a grouping exists, the nodes are not tightly connected.

How it’s used: Local cluster coefficient is important for estimating resilience by understanding the likelihood of group coherence or fragmentation. An analysis of a European power grid using this method found that clusters with sparsely connected nodes were more resilient against widespread failures.

15. Triangle-Count and Average Clustering Coefficient

What it does: Measures how many nodes have triangles and the degree to which nodes tend to cluster together. The average clustering coefficient is 1 when there is a clique and 0 when there are no connections. For the clustering coefficient to be meaningful, it should be significantly higher than a version of the network where all of the relationships have been shuffled randomly.

How it’s used: The average clustering coefficient is often used to estimate whether a network might exhibit “small-world” behaviors that are based on tightly knit clusters. It’s also a factor for cluster stability and resiliency. Epidemiologists have used the average clustering coefficient to help predict various infection rates for different communities.


The world is driven by connections. Neo4j graph analytics reveals the meaning of those connections using practical, optimized graph algorithms including the ones detailed above.

This concludes our series on graph algorithms in Neo4j. We hope these algorithms help you make sense of your connected data in more meaningful and effective ways.

Original Link

The Basics of Databases

Welcome back to our monthly database series! Last time, we took a look at the biggest database articles and news from the month of March. In this article, we’re going to look at some introductory database articles on DZone, explore the concept of databases elsewhere on the web, and look at some publications related to databases.


Check out some of the top introductory database articles on DZone to understand the basics of databases,

  1. The Types of Modern Databases by John Hammink. Where do you begin in choosing a database? We’ve looked at both NoSQL and relational database management systems to come up with a bird’s eye view of both ecosystems to get you started.
  2. Making Graph Databases Fun Again With Java by Otavio Santana. Graph databases need to be made fun again! Not to worry — the open-source TinkerPop from Apache is here to do just that.
  3. How Are Databases Evolving? by Tom Smith. One way that databases are evolving is through the integration and convergence of technologies on the cloud using microservices.
  4. 10 Easy Steps to a Complete Understanding of SQL by Lukas Eder. Too many programmers think SQL is a bit of a beast. It’s one of the few declarative languages out there, and as such, behaves in an entirely different way from imperative, object-oriented, or even functional languages.
  5. MongoDB vs. MySQL by Mihir Shah. There are many database management systems in the market to choose from. So how about a faceoff between two dominant solutions that are close in popularity?

PS: Are you interested in contributing to DZone? Check out our Bounty Board, where you can apply for specific writing prompts and win prizes!

Databasin’ It Up

Let’s journey outside of DZone and check out some recent news, conferences, and more that should be of interest to database newbies.

Dive Even Deeper Into Database

DZone has Guides and Refcardz on pretty much every tech-related topic, but if you’re specifically interested in databases these will appeal the most to you.

  1. The DZone Guide to Databases: Speed, Scale, and Security. Advances in database technology have traditionally been lethargic. That trend has shifted recently with a need to store larger and more dynamic data. This DZone Guide is focused on how to prepare your database to run faster, scale with ease, and effectively secure your data.
  2. Graph-Powered Search: Neo4j & Elasticsearch. In this Refcard, learn how combining technologies adds another level of quality to search results based on code and examples.

Original Link

Graph Algorithms in Neo4j: Streamline Data Discoveries With Graph Analytics

To analyze the billions of relationships in your connected data, you need efficiency and high performance, as well as powerful analytical tools that address a wide variety of graph problems.

Fortunately, graph algorithms are up to the challenge.

In this series on graph algorithms, we’ll discuss the value of graph algorithms and what they can do for you. Last week, we explored how data connections drive future discoveries. This week, we’ll take a closer look at Neo4j’s Graph Analytics platform and put its performance to the test.

The Neo4j Graph Analytics Platform

Neo4j offers a reliable and performant native-graph platform that reveals the value and maintains the integrity of connected data.

First, we delivered the Neo4j graph database, originally used in online transaction processing with exceptionally fast transversals. Then, we added advanced, yet practical, graph analytics tools for data scientists and solutions teams.

Streamline Your Data Discoveries

We offer a growing, open library of high-performance graph algorithms for Neo4j that are easy to use and optimized for fast results. These algorithms reveal the hidden patterns and structures in your connected data around community detection, centrality, and pathways with a core set of tested (at scale) and supported algorithms.

The highly extensible nature of Neo4j enabled the creation of this graph library and exposure as procedures — without making any modification to the Neo4j database.

These algorithms can be called upon as procedures (from our APOC library), and they’re also customizable through a common graph API. This set of advanced, global graph algorithms is simple to apply to existing Neo4j instances, so your data scientists, solutions developers, and operational teams can all use the same native graph platform.

Neo4j also includes graph projection, an extremely handy feature that places a logical sub-graph into a graph algorithm when your original graph has the wrong shape or granularity for that specific algorithm.

For example, if you’re looking to understand the relationship between drug results for men versus women, but your graph is not partitioned for this, you’ll be able to temporarily project a sub-graph to quickly run your algorithm upon and move on to the next step.

Example: High Performance of Neo4j Graph Algorithms

Neo4j graph algorithms are extremely efficient so you can analyze billions of relationships using common equipment and get your results in seconds to minutes, and in a few hours for the most complicated queries.

The chart below shows how Neo4j’s optimized algorithms yield results up to three times faster than Apache Spark(TM) GraphX for Union-Find (Connected Components) and PageRank on the Twitter-2010 dataset with 1.4 billion relationships.

Even more impressive, running the Neo4j PageRank algorithm on a significantly larger dataset with 18 billion relationships and 3 billion nodes delivered results in only 1 hour and 45 minutes (using 144 CPUs and 1TB of RAM).

In addition to optimizing the algorithms themselves, we’ve parallelized key areas such as loading and preparing data as well as algorithms like breadth-first search and depth-first search where applicable.


As you can see, using graph algorithms helps you surface hidden connections and actionable insights obscured within your hordes of data. But even more importantly, the right graph algorithms are optimized to keep your computing costs and time investment to a minimum. Those graph algorithms are available to you know via the Neo4j Graph Platform — and they’re waiting to help you with your next data breakthrough.

Next week, we’ll explore specific graph algorithms, describing what they do and how they’re used.

Original Link

Java-Related April Fools Day Pranks

Although you’d never catch me stooping to this level, it has been interesting over the years to see some of the effort and thought put into Java-related April Fools’ Day pranks. This post references and summarizes some of them.

Google Annotations Gallery (gag)

The Google Annotations Gallery (cleverly abbreviated as ‘gag’) is hosted on Google Code, so you may want to download that as soon as possible so that you do not miss out on it. Both (original release) and (supplements original release to “add many great user-suggested annotations”). These ZIP files include actual Java source code with the libraries that gag depends on.

Some of my favorite annotations provided by gag are @AhaMoment, @Blame, @BossMadeMeDoIt, @Facepalm, @Hack, @HandsOff, @IAmAwesome, @LegacySucks, @Magic, @Noop, and @OhNoYouDidnt.

I also enjoy the WHERE enumeration provided by ‘gag’ to allow one to specify “where” a particular annotation’s meaning may have occurred. Values for WHERE cover the most likely locations to think up the best ideas (most “free time”): BATH, BED, BORING_MEETING, DMV, GYM_WORKOUT, SHOWER, TOILET, and TRAFFIC_JAM.

I was negligent in not mentioning the ‘gag’ library in my recent post on how to effectively divert blame.

New OpenJDK Project: Even Faster JDK Releases

This year (2018), the discuss OpenJDK mailing list includes a couple threads with April Fools’ Day hoaxes. One of these, “New project proposal: even faster JDK releases,” is particularly timely given the relatively recent change to a new Java release every six months. The new cadence has caused some concerns such as those described in “The Java release train is moving faster, but will developers be derailed?

The April 1 proposal proposes “the creation of project Doomed, which aims to solve an extremely important issue caused by the currently specified fast release schedule, that of an equally fast adoption.” Before making several other arguments for Project Doomed, the proposal states, “With project Doomed we aim at continuous release and deployment of the JDK, thus removing the need to have any version number and increase the adoption rate considerably and better position the JDK in the fast pacing world of cloud development.”

New OpenJDK Project: The Block GC

Another April 1 thread on the discuss OpenJDK mailing list starts with the post “New Project Proposal: The Block GC.” The proposal here is for “Block Chain GC”, “an innovative new system for Garbage Collection.” Among other advertised virtues of the Block Chain garbage collector is the ability for it to be “used to calculate hash values for popular cryptocurrencies, a.k.a. ‘bitcoin mining'”. The proposal also outlines where the default recipients of the revenues generated from the Block Chain garbage collector: “by default, the revenue extracted by the Block GC miner will be stored in the Block GC Project account. This revenue will be divided as follows: 90% will go to the initial committers of the Block GC Project, and 10% will go to the OpenJDK community.”

Apache Software Foundation Sold to Oracle

The 2010 April Fools post “The Apache Software Foundation Receives Approval for Sale to Oracle Corporation” announced “Today, the Apache Software Foundation announced it has agreed to sell all IP and assets of the foundation to Oracle.”


ZeroTurnaround announced Frostbyte on April Fools Day in 2012 and advertised it as “a new stack-based language for the JVM” that was “born out of frustration of working with the standard Java software stack and tools.” Among Frostbyte’s advertised features were “use of inverse reverse Polish notation with parentheses” and “the built-in default language is Estonian.” Perhaps the most promising feature of Frostbyte was “built-in AI that is able to make aesthetic judgments about your code and will outright disallow ugly code, over-abstractions, and excessive copy-and-pasting.”

Goto in Java

Another 2010 April Fools Day announcement was Joe Darcy’s “Goto for the Java Programming Language.” “Endorsed” by “Edsger Dijkstra” (author of “go to statement considered harmful“), this proposal advertises that it will “Provide the benefits of the time-testing goto control structure to Java programs.” Fortunately, Java still doesn’t have that form of a “goto,” but it does have its own narrowly-scoped similar capability.

Neo 4 Java

On April Fools’ Day 2016, the Neo4j blog announced Neo 4 Java, “a proprietary 100% pure Arabica available in the caffeine aisle soon, or possibly right at your desk if you happen to have a 3D printer or a really good intern.”

Minecraft Java Edition Textures Finally Perfected

In “Java Edition Textures Finally Perfected,” it was announced of April Fools Day this year that “a new default texture pack for the Java Edition of Minecraft” was being released. Not everyone thought this was funny because it apparently cost some Minecraft users quite a bit of time before they realized it was a one-day prank. A Minecraft bug, MC-127786, was reported with this moderator response, “April fools! This is an April Fools’ joke by Mojang. Textures will return back to normal once April Fools’ Day is over.” Minecraft users should probably be especially wary of April Fools Day pranks because it’s not the first time that Mojang has pulled one.


Several of the April Fools’ Day posts described above required a surprising amount of ingenuity, effort, and time.

Original Link

This Week in Neo4j: Graph Visualization, GraphQL, Spatial, Scheduling, Python

Welcome to this week in Neo4j, where we round up what’s been happening in the world of graph databases in the last seven days. As my colleague Mark Needham is on his well-earned vacation, I’m filling in this week.

Next week we plan to do something different. Stay tuned!

Jeffrey A. Miller works as a Senior Consultant in Columbus, Ohio supporting clients in a wide variety of topics. Jeffrey has delivered presentations (slides) at regional technical conferences and user groups on topics including Neo4j graph technology, knowledge management, and humanitarian healthcare projects.

Jeffrey A. Miller - This Week’s Featured Community Member

Jeffrey A. Miller: This Week’s Featured Community Member

Jeffrey published a really interesting Graph Gist on the Software Development Process Model. He was recently interviewed at the Cross-Cutting Concerns Podcast on his work with Neo4j.

Jeffrey and his wife, Brandy, are aspiring adoptive parents and have written a fun children’s book called “Skeeters” with proceeds supporting adoption.

On behalf of the Neo4j community, thanks for all your work Jeffrey!

  • The infamous Max De Marzi demonstrates how to use Neo4j for a common meeting room scheduling task. Quite impressive Cypher queries in there.
  • Max also demos another new feature of Neo4j 3.4: geospatial indexes. In his blog post, he describes how to use them to find the right type of food place for your tastes via the geolocation of the city that you’re both in.
  • There seems to be a lot of recent interest in Python front-ends for Neo4j, Timothée Mazzucotelli created NeoPy which is early alpha but contains some nice ideas
  • Zeqi Lin has a number of cool repositories of importing different types of data into Neo4j, e.g. Java classes, Git Commits or parts of Docx documents, and even SnowGraph a software data analytics platform built on Neo4j.
  • I think I came across this before, but the newrelic-neo4j is really a neat way of getting Neo4j metrics into NewRelic, thanks Ștefan-Gabriel Muscalu. While browsing his repositories I also came across this WikiData Neo4j Importer which I need to test out
  • This AutoComplete system uses Neo4j which stores terms, counts and other associated information. It returns top 10 suggestions for auto-complete and tracks usage patterns.
  • Sam answered a question on counting distinct paths on StackOverflow.

Nigel is teasing us:

A new version of py2neo is coming soon. Designed for Neo4j 3.x, this will remove the previously mandatory HTTP dependency and include a new set of command line tools and other goodies. Expect an alpha release within the next few days.

Graph Visualizations

I had some fun this week with 3d-force-graph and neo4j. It was really easy to combine the 3d graph visualization project based on three.js and available in 2D, 3D, for VR and as React Components with the Neo4j javascript driver. The graphs up to 5,000 relationships load sub-second.

See the results of my experiments in my repository which also links to several live versions of different setups (thanks to rawgit).

weights got

My colleague Will got an access key to Graphistry and used this Jupyter Notebook to load the Russian Twitter trolls from Neo4j.


I also came across another Cytoscape plugin for Neo4j, which looks quite useful.

Zhihong SHEN created a Data Visualizer for larger Neo4j graphs using vis.js, you can see an online demo here

Desktop and GraphQL

This weeks update of Neo4j Desktop has seen the addition of the neo4j-graphql extension that our team has been working on for a while.

There will be more detail about it from Will next week but I wanted to share a sneak preview for all of you that want to have some fun with GraphQL and Neo4j over the weekend.

Tweet of the Week

My favorite tweet this week was our own Easter Bunny:

View image on Twitter

Image title

Don’t forget to RT if you liked it, too.

Original Link

Meet SemSpect: A Different Approach to Graph Visualization

(As community content, this post reflects the views and opinions of the particular author and does not necessarily reflect the official stance of Neo4j.)

Understanding large graphs is challenging. Sure, a proper Cypher query can retrieve valuable information. But how do you find the pivotal queries when the structure of your graph is not known? In this post, I discuss SemSpect: a tool that makes use of a visualization paradigm that allows you to ad-hoc visualize and interactively query large graphs to understand, analyze, and track your graph data as a whole.

Given a large property graph, how do you gain meaningful insights from it?

For instance, what groups of nodes relate to each other? Are there any characteristics in the network or unexpected connections?

Exploring such patterns can help you realize the overall graph structure and to discover anomalies in the data. Trying to invent Cypher queries to make all those patterns explicit is not always a reasonable solution.

Fortunately, the Neo4j apoc.meta.* procedures provide some helpful features in this respect. They ship with the optional APOC procedure library available from Neo4j. For instance, to depict the overall structure of a Neo4j graph you can use:

CALL apoc.meta.graph

For the Neo4j dump of the Paradise Papers data from the ICIJ, the result looks as follows:

While already helpful, this graph visualization is just a static rendering and does not expose any relationships to nodes of the underlying original graph. Furthermore, one can imagine that this meta-graph may be itself confusing in case of more diverse node labels or relationships.

Overview: Details on Demand

According to our experience with business-critical graphs, an effective graph dataset needs data-driven exploration and data-sensitive visualization to make sense of large graphs.

Our SemSpect tool aims at enabling even domain and query novices to carry out sophisticated graph research by interacting with a visual representation of the data network.

This data visualization approach is different from commonly known property graph renderings. SemSpect groups nodes by their label and aggregates relationships between groups unless the user asks for details. That difference is key to keeping user orientation and information for large graphs.

Let’s see how this works by playing with the previously mentioned Paradise Papers: consider if a user selects the Officer group as the root of an exploration query (see the image below).

SemSpect depicts this group as a labeled circle showing its number of nodes in the center. The tool guides the user by offering data-driven choices for expanding the exploration graph with the help of a menu to choose a group (called a category in SemSpect) and a relationship for instant exploration.

The expansion choice above will result in an exploration graph — depicted as a tree, spanning from a root group from left to right — showing all officers and those entities to which there is a OFFICER_OF relationship.

As mentioned before, SemSpect aggregates nodes and individual relationships for clarity and comprehensibility. Only when the overall number of nodes of a group is below an adjustable threshold, nodes are shown as gray dots within a group just as displayed for the 39 underlying intermediaries of all officers below.

A number in a particular node indicates the number of related nodes in the preceding group. When selecting a node, its property keys are shown in a dossier and its direct or indirect related nodes in other groups are highlighted (when visible).

Connecting the Dots of a Graph

A tabular view lists details of nodes on demand as shown in the screenshot below.

To create a custom group of Officers from Monaco we just need to open the tabular view for Officers (1) and search for “Monaco” in the countries column (2). The resulting selection can be applied as a filter with one click (3). As a consequence of filtering the Officer group, all other depending groups in the exploration graph are adapted accordingly.

The Officers from Monaco can now be named and saved as a custom group. There are many more features in SemSpect such as selective filter propagation, reporting, etc., so I’ll have to elaborate in a follow-up blog post.

Fairly complex queries can be built by successively exploring groups or nodes and interactive filtering. Clearly, the query expressivity of SemSpect does not cover all of Cypher. Instead, its specific strength lies in the data-driven guidance while exploring and intuitive filtering options for querying the graph without learning any query syntax.

For those who often poke around in the dark with their Cypher queries, SemSpect is a great tool to explore their graph data, to answer complex queries and to find data quality issues.

If you want to try it on your own for the Offshore Leaks, just jump to here.

The Technology Underneath

SemSpect has a Web UI based on HTML5/JavaScript. The Java backend incorporates GraphScale, a technology that can inject reasoning to graph stores such as Neo4j, as I briefly introduced in a previous post.

This implies that SemSpect can draw on full RDFS and OWL 2 RL reasoning capabilities. However, RDF-based data is not a requirement. We are currently adapting SemSpect such that it can be applied directly to virtually any Neo4j graph database. In such a case, the graph abstraction computed by GraphScale is used as the key index for graph exploration and filtering.

Original Link

Scheduling Meetings With Neo4j

One of the symptoms of any fast-growing company is the lack of available meeting rooms. The average office worker gets immense satisfaction to their otherwise mundane workday when they get to kick someone else out of the meeting room they booked. Of course, that joy can be cut short (along with their career) once realizing some unnoticed VIP was unceremoniously kicked out. It’s not a super exciting use case, but today, I’m going to show you how to use Neo4j to perform some scheduling gymnastics.

Let’s start with what the data model looks like:

So, we have a Person that sits in a Cubicle that is located in a Floor that has meeting Rooms where Meetings are booked and these Meetings are attended by People. That’s a nice circle model right there. Let’s build an example with three people each in their own cubicle, two floors, four meeting rooms, two in each floor, and a bunch of meetings. We’ll also have one of the people booked in one of the existing meetings. We will use Longs for the times representing Unix Epoc Time in milliseconds. In Neo4j 3.4, we will have legitimate date and datetime data types, so you will be able to create date times like localdatetime({year:1984, month:10, day:11, hour:12, minute:31, second:14}) instead of this hot mess, but regardless, here is the Cypher for this example:

CREATE (person1:Person {name: "Max"})
CREATE (person2:Person {name: "Alex"})
CREATE (person3:Person {name: "Andrew"})
CREATE (cube1A:Cubicle {name: "F1A"})
CREATE (cube1B:Cubicle {name: "F1B"})
CREATE (cube2A:Cubicle {name: "F2A"})
CREATE (floor1:Floor {name: "Floor 1"})
CREATE (floor2:Floor {name: "Floor 2"})
CREATE (person1)-[:SITS_IN]->(cube1A)
CREATE (person2)-[:SITS_IN]->(cube1B)
CREATE (person3)-[:SITS_IN]->(cube1C)
CREATE (cube1A)-[:LOCATED_IN]->(floor1)
CREATE (cube1B)-[:LOCATED_IN]->(floor1)
CREATE (cube1C)-[:LOCATED_IN]->(floor2)
CREATE (room1:Room {name:"Room 1"})
CREATE (room2:Room {name:"Room 2"})
CREATE (room3:Room {name:"Room 3"})
CREATE (room4:Room {name:"Room 4"})
CREATE (room1)-[:LOCATED_IN]->(floor1)
CREATE (room2)-[:LOCATED_IN]->(floor1)
CREATE (room3)-[:LOCATED_IN]->(floor2)
CREATE (room4)-[:LOCATED_IN]->(floor2)
CREATE (m1:Meeting {start_time: 1521534600000, end_time:1521538200000}) // 8:30-9:30am
CREATE (m2:Meeting {start_time: 1521543600000, end_time:1521550800000}) // 11-1pm
CREATE (m3:Meeting {start_time: 1521550800000, end_time:1521558000000}) // 1-3pm
CREATE (m4:Meeting {start_time: 1521534600000, end_time:1521543600000}) // 8:30-11am
CREATE (m5:Meeting {start_time: 1521550800000, end_time:1521554400000}) // 1-2pm
CREATE (m6:Meeting {start_time: 1521561600000, end_time:1521565200000}) // 4-5pm
CREATE (m7:Meeting {start_time: 1521558000000, end_time:1521561600000}) // 3-4pm
CREATE (room1)-[:IS_BOOKED_ON_2018_03_20]->(m1)
CREATE (room1)-[:IS_BOOKED_ON_2018_03_20]->(m2)
CREATE (room1)-[:IS_BOOKED_ON_2018_03_20]->(m3)
CREATE (room2)-[:IS_BOOKED_ON_2018_03_20]->(m4)
CREATE (room2)-[:IS_BOOKED_ON_2018_03_20]->(m5)
CREATE (room2)-[:IS_BOOKED_ON_2018_03_20]->(m6)
CREATE (room4)-[:IS_BOOKED_ON_2018_03_20]->(m7)
CREATE (person2)-[:HAS_MEETING_ON_2018_03_20]->(m7)

This Cypher script creates this lovely set of data:

By looking at it, we can see how it is all connected, but it is not immediately obvious what times in what rooms people are able to meet. That is going to be our question. Give a set of meeting attendees and a datetime range and find me the available meeting times in the rooms that are in the same floor as at least one of the attendees. Let’s try building this query together once piece at a time. So, the first thing I want to do is find out what time ranges I have to eliminate because one of the attendees is already booked for another meeting.

MATCH (p:Person)
WHERE IN ["Max", "Alex", "Andrew"]
OPTIONAL MATCH (p)-[:HAS_MEETING_ON_2018_03_20]->(m:Meeting)
│"p" │"m" │
│{"name":"Max"} │null │
│{"name":"Alex"} │{"end_time":1521561600000,"start_time":1521558000000}│
│{"name":"Andrew"}│null │

So, it looks like Alex is busy from 3-4 PM. Next, we need to figure out where everyone sits, what floor they are in, and what rooms we are able to meet in. So, our query looks like this:

MATCH (p:Person)-[:SITS_IN]->(c:Cubicle)-[:LOCATED_IN]->(f:Floor)<-[:LOCATED_IN]-(r:Room)
WHERE IN ["Max", "Alex", "Andrew"]

This gets us Rooms 1-4 as expected:

│"r" │
│{"name":"Room 1"}│
│{"name":"Room 2"}│
│{"name":"Room 3"}│
│{"name":"Room 4"}│

OK, so far, so good. Now, we need to know if those rooms have already been booked for other meetings today.

MATCH (p:Person)-[:SITS_IN]->(c:Cubicle)-[:LOCATED_IN]->(f:Floor)<-[:LOCATED_IN]-(r:Room)
WHERE IN ["Max", "Alex", "Andrew"]
OPTIONAL MATCH (r)-[:IS_BOOKED_ON_2018_03_20]->(m:Meeting)
WITH r, m ORDER BY m.start_time

This query tells us that Room 1 has three meetings scheduled; Room 2 has three, as well; Room 3 is wide open; Room 4 just has one. But it is really hard to see the actual times since they are shown as Longs.

│""│"meetings" │
│"Room 1"│[{"end_time":1521538200000,"start_time":1521534600000},{"end_time":152│
│ │1550800000,"start_time":1521543600000},{"end_time":1521558000000,"star│
│ │t_time":1521550800000}] │
│"Room 2"│[{"end_time":1521543600000,"start_time":1521534600000},{"end_time":152│
│ │1554400000,"start_time":1521550800000},{"end_time":1521565200000,"star│
│ │t_time":1521561600000}] │
│"Room 3"│[] │
│"Room 4"│[{"end_time":1521561600000,"start_time":1521558000000}] │

If you are using the Neo4j APOC plugin, we can use the function to make them friendlier. In Neo4j 3.4, you will be able to use a built-in function datetime.FromEpochMillis for the same thing.

MATCH (p:Person)-[:SITS_IN]->(c:Cubicle)-[:LOCATED_IN]->(f:Floor)<-[:LOCATED_IN]-(r:Room)
WHERE IN ["Max", "Alex", "Andrew"]
OPTIONAL MATCH (r)-[:IS_BOOKED_ON_2018_03_20]->(m:Meeting)
WITH r, m ORDER BY m.start_time
RETURN, EXTRACT (x IN COLLECT(DISTINCT m) |,'ms','HH:mm') + ' to ' +,'ms','HH:mm')) AS meetings

Here we go; now, that is way more readable:

│""│"meetings" │
│"Room 1"│["08:30 to 09:30","11:00 to 13:00","13:00 to 15:00"]│
│"Room 2"│["08:30 to 11:00","13:00 to 14:00","16:00 to 17:00"]│
│"Room 3"│[] │
│"Room 4"│["15:00 to 16:00"] │

Alright, let’s combine the two queries together and see what rooms we can meet in and what times we can’t meet in those rooms because they are either already booked, or one of our attendees is busy:

MATCH (p:Person)
WHERE IN ["Max", "Alex", "Andrew"]
OPTIONAL MATCH (p)-[:HAS_MEETING_ON_2018_03_20]->(m:Meeting)
WITH COLLECT(m) AS occupied
MATCH (p:Person)-[:SITS_IN]->(c:Cubicle)-[:LOCATED_IN]->(f:Floor)<-[:LOCATED_IN]-(r:Room)
WHERE IN ["Max", "Alex", "Andrew"]
WITH r, occupied
OPTIONAL MATCH (r)-[:IS_BOOKED_ON_2018_03_20]->(m:Meeting)
WITH r, COLLECT(DISTINCT m) + occupied AS meetings
UNWIND meetings AS m
WITH r, m ORDER BY m.start_time
RETURN, EXTRACT (x IN COLLECT(m) |,'ms','HH:mm') + ' to ' +,'ms','HH:mm')) AS meetings
│""│"meetings" │
│"Room 1"│["08:30 to 09:30","11:00 to 13:00","13:00 to 15:00","15:00 to 16:00"]│
│"Room 2"│["08:30 to 11:00","13:00 to 14:00","15:00 to 16:00","16:00 to 17:00"]│
│"Room 3"│["15:00 to 16:00"] │
│"Room 4"│["15:00 to 16:00","15:00 to 16:00"] │

Now, we could stop here and let our application mark those times as unavailable and call it a day. But what we really want is the opposite of that. We want the times that the rooms and attendees are available. So, how do we figure that out? Well, for each meeting, we want to find the next meeting start time for each room. The time slot between meetings is what we are after, defined by the entry’s end time and the start time of the next event. To perform this, we are going to use a double-unwind, which is basically “for each thing in the list, I want to pair it (get a cross product) with every other thing in the list.” Typically, this is the last thing you want to do since making a cross product can be very expensive, but it makes perfect sense for this query. We only care about the times where one meeting start time is greater than or equal to the other end time, and from these, we will grab our time slot as the query below shows:

MATCH (p:Person)
WHERE IN ["Max", "Alex", "Andrew"]
OPTIONAL MATCH (p)-[:HAS_MEETING_ON_2018_03_20]->(m:Meeting)
WITH COLLECT(m) AS occupied
MATCH (p:Person)-[:SITS_IN]->(c:Cubicle)-[:LOCATED_IN]->(f:Floor)<-[:LOCATED_IN]-(r:Room)
WHERE IN ["Max", "Alex", "Andrew"]
WITH r, occupied
OPTIONAL MATCH (r)-[:IS_BOOKED_ON_2018_03_20]->(m:Meeting)
WITH r, [{start_time:1521565200000, end_time:1521534600000}] + COLLECT(m) + occupied AS meetings
UNWIND meetings AS m
WITH r, [min(m.start_time), max(m.end_time)] AS rslot, COLLECT(m) AS mm
WITH r, rslot, mm
WITH r, rslot, m1, m2 WHERE (m2.start_time >= m1.end_time)
WITH r, rslot, [m1.end_time, min(m2.start_time)] AS slot
ORDER BY slot[0]
RETURN, EXTRACT (x IN COLLECT(slot) |[0],'ms','HH:mm') + ' to ' +[1],'ms','HH:mm')) AS available

Our output looks close, but it’s not quite there. Rooms 3 and 4 look correct, but for Room 1 and 2, we have start time and end times that are the same:

│""│"available" │
│"Room 1"│["08:30 to 08:30","09:30 to 11:00","13:00 to 13:00","15:00 to 15:00","│
│ │16:00 to 17:00"] │
│"Room 2"│["08:30 to 08:30","11:00 to 13:00","14:00 to 15:00","16:00 to 16:00","│
│ │17:00 to 17:00"] │
│"Room 3"│["08:30 to 15:00","16:00 to 17:00"] │
│"Room 4"│["08:30 to 15:00","16:00 to 17:00"] │

So, let’s fix that by not allowing any slots that start and end at the same time:

MATCH (p:Person)
WHERE IN ["Max", "Alex", "Andrew"]
OPTIONAL MATCH (p)-[:HAS_MEETING_ON_2018_03_20]->(m:Meeting)
WITH COLLECT(m) AS occupied
MATCH (p:Person)-[:SITS_IN]->(c:Cubicle)-[:LOCATED_IN]->(f:Floor)<-[:LOCATED_IN]-(r:Room)
WHERE IN ["Max", "Alex", "Andrew"]
WITH r, occupied
OPTIONAL MATCH (r)-[:IS_BOOKED_ON_2018_03_20]->(m:Meeting)
WITH r, [{start_time:1521565200000, end_time:1521534600000}] + COLLECT(m) + occupied AS meetings
UNWIND meetings AS m
WITH r, [min(m.start_time), max(m.end_time)] AS rslot, COLLECT(m) AS mm
WITH r, rslot, mm
WITH r, rslot, m1, m2 WHERE (m2.start_time >= m1.end_time)
WITH r, rslot, [m1.end_time, min(m2.start_time)] AS slot
ORDER BY slot[0]
WITH r, [[1521534600000, rslot[0]]] + collect(slot) + [[rslot[1], 1521565200000]] AS open
WITH r, filter(x IN open WHERE x[0]<>x[1]) AS available
UNWIND available AS dups
RETURN AS Room , EXTRACT (x IN tslots |[0],'ms','HH:mm') + ' to ' +[1],'ms','HH:mm')) AS Available

…and there we go:

│"Room" │"Available" │
│"Room 1"│["09:30 to 11:00","16:00 to 17:00"]│
│"Room 2"│["11:00 to 13:00","14:00 to 15:00"]│
│"Room 3"│["08:30 to 15:00","16:00 to 17:00"]│
│"Room 4"│["08:30 to 15:00","16:00 to 17:00"]│

Pretty neat, right? To be totally honest, I didn’t come up with this query by myself. I had a ton of help from Alex Price and Andrew Bowman.

I asked Michael Hunger, and he had another idea: ordering the meeting times and using lists and ranges instead of a double unwind to get the same answer. Here, he is also using‘2018-03-20 08:30:00’) instead of 1521534600000 to make the query more readable. Yes, these dates will be much nicer to work with in Neo4j 3.4… I can’t wait, either.

MATCH (p:Person)
WHERE IN ["Max", "Alex", "Andrew"]
OPTIONAL MATCH (p)-[:HAS_MEETING_ON_2018_03_20]->(m:Meeting)
WITH COLLECT(m) AS occupied
MATCH (p:Person)‐[:SITS_IN]‐>(c:Cubicle)‐[:LOCATED_IN]‐>(f:Floor)<‐[:LOCATED_IN]‐(r:Room)
WHERE IN ["Max", "Alex", "Andrew"]
WITH DISTINCT r, occupied
OPTIONAL MATCH (r)‐[:IS_BOOKED_ON_2018_03_20]‐>(m:Meeting)
WITH r, occupied + COLLECT(m {.start_time, .end_time}) AS meetings
UNWIND meetings AS m
WITH r, m order by m.start_time
WITH r, COLLECT(m) as meetings
WITH r,meetings, {'2018-03-20 08:30:00')} + meetings + {'2018-03-20 17:00:00')} AS bookedSlots
WITH r, meetings,[idx in range(0,size(bookedSlots)-2) | {start_time:(bookedSlots[idx]).end_time,end_time:(bookedSlots[idx+1]).start_time}] as allSlots
WITH r, meetings,[slot IN allSlots WHERE slot.end_time - slot.start_time > 10*60*1000] as openSlots
WITH r, [slot IN openSlots WHERE NONE(m IN meetings WHERE slot.start_time < m.start_time < slot.end_time OR slot.start_time < m.end_time < slot.end_time)] as freeSlots
RETURN r, [slot IN freeSlots |,'ms','HH:mm')+" to ",'ms','HH:mm')] as free

If you want expert help with your Cypher queries (and anything else Neo4j), be sure to join our Neo4j Users Slack Group, where over 7,500 Neo4j users hang out.

Original Link

Theo 4.0 Release: The Swift Driver for Neo4j

Last week, I wrote about Graph Gopher, the Neo4j client for iPhone. I mentioned that it was built alongside version 4.0 of Theo, the Swift language driver for Neo4j. Today, we’ll explore the Theo 4.0 update in more detail.

But before we dive into the Theo update, let’s have a look at what Theo looks like with a few common code examples:

Instantiating TheoCreating a Node and Getting the Newly Created Node Back, Complete With Error Handling

Looking Up a Node by ID, Including Error Handling and Handling if the Node Was Not FoundPerforming a Cypher Query and Getting the ResultsPerforming a Cypher Query Multiple Times With Different Parameters as Part of a Transaction, Then Rolling It Back

As you can see, it is very much in line with how you would expect Swift code to read, and it integrates with Neo4j very much how you would expect a Neo4j integration to be. So no hard learning curves, meaning you can start being productive right away.

What’s New in Theo 4.0

Now for the update story:

Theo 4.0 had a few goals:

  • Make a results-oriented API
  • Support Swift 4.0
  • Remove REST support

Theo 3.1 was our first version to support Bolt, and while it has matured since then, it turned out to be very stable, memory-efficient and fast right out of the gate.

We learned from using Theo 3 that a completion-block-based API that could throw exceptions, while easy to reason about, could be rather verbose, especially for doing many tasks in a transaction. For version 4, we explored – and ultimately decided upon – a Result type-based API.

That means that a request would still include a completion block, but it would be called with a Result type that would contain either the values successfully queried for, or an error describing the failure.

Theo 3 having a throwing function with a regular completion block.Theo 4, same example, but now with a Result type in the completion block instead.

This allowed us to add parsing that matched each query directly, and thus the code using the driver could delete the result parsing. For our example project, Theo-example, the result was a lot of less code. That means less code to debug and maintain.

Theo-example connection screen.Theo-example main screen.

Theo 3.2 added Swift 4 support, in addition to Swift 3. In Theo 4, the main purpose of this release – other than to incorporate the improvements done on the Bolt implementation – was that Theo 4 would remove the REST client that by 3.2 was marked as deprecated.

Having Theo 3.2 compatible with Swift 4 meant that projects using the REST client could use this as a target for a while going forward, giving them plenty of time to update. We committed to keeping this branch alive until Swift 5 arrived.

The main reason to remove the REST client was that the legacy Cypher HTTP endpoint it was using has been deprecated. This was the endpoint Theo 1 had been built around. Bolt is the preferred way for drivers, and hence it made little sense to adapt the REST client to the transactional Cypher HTTP endpoint that succeeds the legacy Cypher HTTP endpoint.

The result of these changes is an API that is really powerful, yet easy to use. The developer feedback we’ve gotten so far has been very positive. Theo 4 was in beta for a very long time and is now mature enough that we use it in our own products, such as Graph Gopher.

Going forward with Theo 4, the main plan is bugfixes, ensure support for new Neo4j versions, and minor improvements based on community input.

Looking Forward to Theo 5.0

The next exciting part will be Theo 5, which will start taking shape when Swift 5 is nearing ready.

The next major API change will be when Swift updates its concurrency model so that the API will stay close to the recommended Swift style. Specifically, we are hoping that Swift 5 will bring an async-await style concurrency model that we would then adapt to Theo. But it may very well be that this will have to wait until later Swift versions.

Other Ways to Connect to Neo4j Using the Swift Programming Language

If you think Theo is holding your hands too much, you can use Bolt directly through the Bolt-Swift project. The API is fairly straightforward to use, and hey, if you need an example project you can always browse the Theo source code. 

Another interesting project to come out of Theo and Bolt support is PackStream-Swift. PackStream is the format that Bolt uses to serialize objects, in a way similar to the more popular MessagePack protocol. So if you simply need a way to archive your data or communicate them across another protocol than Bolt, perhaps PackStream will fit your needs.

Give Us Your Feedback!

You can ask questions about Theo both on Stack Overflow (preferably) or in the #neo4j-swift channel in the neo4j-users Slack.

If you find issues, we’d love a pull request with a proposed solution. But even if you do not have a solution, please file an issue on the GitHub page.

We hope you enjoy using Theo 4.0!

Original Link

This Week in Neo4j: Property Based Access Control, Cypher, and User Path Analysis

Welcome to this week in Neo4j where we round up what’s been happening in the world of graph databases in the last 7 days.

This week we have a sneak peek at property based access control in Neo4j 3.4, user path analysis with Snowplow analytics, resources to get started with the Cypher query language, and more!

This week’s featured community member is Iryna Feuerstein, Software Engineer at PRODYNA – Neo4j Partner and sponsor of the GraphTour.

Iryna has been part of the Neo4j community for several years, is the organizer of the Düsseldorf Neo4j Meetup group, and has given a number of talks and workshops on Neo4j around the German-speaking region.

This week Iryna gave an introduction to Neo4j for kids at the JavaLand conference and a talk on modeling and importing each paragraph and section of the German laws into the graph.

Iryna’s work on importing and querying the Comparative Toxicogenomics Database is really interesting too in relating environmental factors to human health. She will give a workshop on this topic on May 25 in Berlin.

On behalf of the Neo4j community, thanks for all your work Iryna!

Keeping Properties Secret in Neo4j

We are frequently asked how to do property based access control in Neo4j and Max De Marzi has written a post in which he gives a sneak peak of this feature which will be released in Neo4j 3.4.

Keeping properties secret in Neo4j

Max shows us how this works by going through an example based on node properties indicating the existence (or not!) of aliens. You can download an alpha version of Neo4j that has this feature from the other releases page of

Intro to Cypher

This week we have a couple of excellent resources for getting started with the graph query language Cypher.

In Big Data analytics with Neo4j and Java, Part 1 Steven Haines shows how to model a social network in MySQL and Neo4j using examples from the Neo4j In Action book.

He shows how to create and query a social graph of his family and their friends, with detailed explanations of Cypher’s CREATE and MATCH clauses.

If you prefer video content Esteve Serra Clavera released the Cypher Syntax part of his Introduction to Neo4j online course.

Neo4j-GraphQL, Extending R for Neo4j, Indie Music Network

On the Podcast: Dilyan Damyanov

This week on the podcast Rik interviewedDilyan Damyanov, Data Scientist at Snowplow Analytics.

They talk about Dilyan’s work doing path analysis and how Snowplow have been able to use graphs to track people moving through the different stages of a marketing funnel and work out which marketing twitch causes them to convert.

Dilyan also presented at the Neo4j Online Meetup where he showed how to write Cypher queries that enable this kind of analysis.

Next Week

What’s happening next week in the world of graph databases?

Tweet of the Week

My favourite tweet this week was by Daniel Gallagher:

View image on Twitter


Today I made the switch to Neo4j to feed @Graphistry. The natural ability to be able to draw inferred user relationships simply off of tweet interaction is awesome!

I thought I had done something wrong here, but this led me directly to an account that is a weird anomaly… ��


55 people are talking about this

Don’t forget to RT if you liked it too.

Original Link

Neo4j Cypher Error: Don’t Know How to Add Double and String [Code Snippets]

I recently upgraded a Neo4j-backed application from Neo4j 3.2 to Neo4j 3.3 and came across an interesting change in behavior around type coercion, which led to my application throwing a bunch of errors.

In Neo4j 3.2 and earlier, if you added a String to a Double, it would coerce the Double to a String and concatenate the values. The following would, therefore, be valid Cypher:

RETURN toFloat("1.0") + " Mark" ╒══════════╕
│"result" │
│"1.0 Mark"│

This behavior has changed in the 3.3 series and will instead throw an exception:

RETURN toFloat("1.0") + " Mark" Neo.ClientError.Statement.TypeError: Don't know how to add `Double(1.000000e+00)` and `String(" Mark")`

We can work around that by forcing our query to run in 3.2 mode:

RETURN toFloat("1.0") + " Mark" AS result

Or we can convert the Double to a String in our Cypher statement:

RETURN toString(toFloat("1.0")) + " Mark" AS result

Original Link

Keeping Properties Secret in Neo4j

We’re an open-source company with nothing to hide, but some of our customers have things they need to keep close to their chest. Sometimes, you don’t want everybody to have access to salary information or future predictions. Maybe you want to hide personally identifiable information (PII) or Health Insurance Portability and Accountability Act (HIPPA) data. In Neo4j 3.4, we are introducing more security controls. We are starting with role-based database-wide property key blacklists. That’s a bit of a mouthful, but let’s walk through and look at an example to see one of the ways it can be utilized. Imagine you are working in Area 51 and have to deal with very important information.

You want your boss who has “top secret” access to know the truth, you want your friend James from Area 50 who has “secret” access to know a little less than the whole truth. You want those like Tim who have “confidential” access to know a little less than that, and finally, you want the public to know the least and only the unclassified information. We will need to change the neo4j.conf file in the config directory of our Neo4j installation, add a couple of lines, save the file, and restart Neo4j (on all cluster members):;Confidential=top_secret,secret;Unclassified=top_secret,secret,confidential

In Neo4j Enterprise Edition, you will need to create a few accounts. The format is:

CALL, password, requirePasswordChange)

So, for example:

CALL"james", "1234", false); CALL"tim", "5678", false);
CALL"public", "password", false);

Next, we will create security roles for these accounts:


And we will add the roles to the users:

CALL"Secret", "james");
CALL"Confidential", "tim");
CALL"Unclassified", "public");

But before they can read anything from the database, they also need reader access:


We can call listRoles to see how it looks:


Now that everything is set, we will create a report on the existence of “Aliens”.

We have different versions of the truth, so we will create multiple properties to answer the question in our document:

CREATE (u:Document {name:'Aliens?', top_secret:'They hate us!', secret:'They like us!', confidential:'They exist!', public:'They do not exist.'})

We can query for the truth using COALESCE. It will use the first non-null property it finds. Since we are logged on as the Neo4j admin user with full access when we ask:

MATCH (u:Document {name:'Aliens?'})
RETURN COALESCE (u.top_secret, u.secret, u.confidential, u.public) AS truth

We get the real answer “They hate us!” Now, let’s disconnect and try a different account.

:server disconnect

Log back in as James, and rerun the query and we get “They like us!” Disconnect again, and log back in as Tim and you get “They exist!” One more time as user public, and you get “They do not exist”. Pretty neat right? So, you can use this feature to keep sensitive data away from people and also show different levels of detail. If you want to try it out, you can get a pre-release version of Neo4j 3.4 here.

Original Link

Eclipse JNoSQL: A Quick Overview of Redis, Cassandra, Couchbase, and Neo4j

Eclipse JNoSQL is a framework that helps Java developers use Java EE and NoSQL databases so that they can have scalable applications. In the NoSQL world, there are four types of databases: key-value, column, document, and graph. Each one has a particular purpose, level of scalability, and model complexity. The most straightforward model, key-value, is the most scalable; however, it is limited in the model. In this tutorial, you’ll connect with four different databases (Redis as key-value, Cassandra as column, Couchbase as document, and Neo4J as graph) using the same annotation.

Introduction to Databases

  1. Redis: Redis is a software project that implements data structure servers. It is open-source, is networked, is in-memory, and stores keys with optional durability.

  2. Cassandra: Apache Cassandra is a free and open-source distributed database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure.

  3. Couchbase: Couchbase Server, originally known as Membase, is an open-source, distributed multi-model NoSQL document-oriented database software package that is optimized for interactive applications.

  4. Neo4j is a graph database management system developed by Neo4j, Inc. Described by its developers as an ACID-compliant transactional database with native graph storage and processing, Neo4j is the most popular graph database according to Neo4j is available in a GPL3-licensed open-source Community Edition, with online backup and high availability extensions licensed under the terms of the Affero General Public License. Neo also licenses Neo4j with these extensions under closed-source commercial terms. Neo4j is implemented in Java and accessible from software written in other languages using the Cypher Query Language through a transactional HTTP endpoint or through the binary “bolt” protocol.


The first step is to install all the NoSQL databases. To make this process easier, we will install all databases using Docker and their commands below.



docker run --name redis-instance -p 6379:6379 -d redis


docker run -d --name casandra-instance -p 9042:9042 cassandra


docker run -d --name couchbase-instance -p 8091-8094:8091-8094 -p 11210:11210 couchbase

Follow the instructions here.

CREATE PRIMARY INDEX index_gods on gods;


For this sample, we have the God entity with idname, and power as fields. The annotations use JPA annotation.

public class God { @Id private String id; @Column private String name; @Column private String power; //... }

There is a Repository interface that implements the basic operations in the database. Also, it has the method query, which gives the method that Eclipse JNoSQL will implement to the Java developer:

public interface GodRepository extends Repository<God, String> { Optional<God> findByName(String name);

Infrastructure Code

In a Maven project, we need to set the dependency project. Eclipse JNoSQL has two layers for mapping. One has the JPA annotation and the other has the JDBC annotation. So, there is one mapping layer to each NoSQL database and one to each driver communication.

 <!-- Mapping dependency --> <dependency> <groupId>org.jnosql.artemis</groupId> <artifactId>artemis-configuration</artifactId> <version>${artemis.vesion}</version> </dependency> <dependency> <groupId>org.jnosql.artemis</groupId> <artifactId>artemis-column</artifactId> <version>${artemis.vesion}</version> </dependency> <dependency> <groupId>org.jnosql.artemis</groupId> <artifactId>artemis-document</artifactId> <version>${artemis.vesion}</version> </dependency> <dependency> <groupId>org.jnosql.artemis</groupId> <artifactId>artemis-key-value</artifactId> <version>${artemis.vesion}</version> </dependency> <dependency> <groupId>org.jnosql.artemis</groupId> <artifactId>graph-extension</artifactId> <version>${artemis.vesion}</version> </dependency> <!-- Communication driver --> <dependency> <groupId>org.jnosql.diana</groupId> <artifactId>redis-driver</artifactId> <version>${artemis.vesion}</version> </dependency> <dependency> <groupId>org.jnosql.diana</groupId> <artifactId>cassandra-driver</artifactId> <version>${artemis.vesion}</version> </dependency> <dependency> <groupId>org.jnosql.diana</groupId> <artifactId>couchbase-driver</artifactId> <version>${artemis.vesion}</version> </dependency> <!-- TinkerPop + Neo4J dependency --> <dependency> <groupId>org.apache.tinkerpop</groupId> <artifactId>gremlin-core</artifactId> <version>${tinkerpop.version}</version> </dependency> <dependency> <groupId>com.steelbridgelabs.oss</groupId> <artifactId>neo4j-gremlin-bolt</artifactId> <version>0.2.27</version> </dependency> <dependency> <groupId>org.neo4j.driver</groupId> <artifactId>neo4j-java-driver</artifactId> <version>1.5.1</version> </dependency>

Also, there is a configuration that has the password, user, and other setting configurations to each NoSQL database.

[ { "description": "The redis key-value configuration", "name": "key-value", "provider": "org.jnosql.diana.redis.key.RedisConfiguration", "settings": { "redis-master-host": "localhost", "redis-master-port": "6379" } }, { "description": "The Cassandra column configuration", "name": "column", "provider": "org.jnosql.diana.cassandra.column.CassandraConfiguration", "settings": { "cassandra-host-1": "localhost", "cassandra-query-1": "CREATE KEYSPACE IF NOT EXISTS gods WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 3};", "cassandra-query-2": "CREATE COLUMNFAMILY IF NOT EXISTS gods.god (\"_id\" text PRIMARY KEY, name text, power text);", "couchbase-password": "123456" } }, { "description": "The couchbase document configuration", "name": "document", "provider": "org.jnosql.diana.couchbase.document.CouchbaseDocumentConfiguration", "settings": { "couchbase-host-1": "localhost", "couchbase-user": "root", "couchbase-password": "123456" } }, { "description": "The Neo4J configuration", "name": "graph", "settings": { "url": "bolt://localhost:7687", "admin": "neo4j", "password": "admin" } }

With the configuration done, the next step is to make the entity manager available to CDI so that it’s easier. Eclipse JNoSQL has the ConfigurationUnit annotation that will read from the jnosql.json file. Take the key-value as an example:

public class BucketManagerProducer { private static final String HEROES = "gods"; @Inject @ConfigurationUnit(name = "key-value") private BucketManagerFactory<BucketManager> bucketManager; @Produces @ApplicationScoped public BucketManager getBucketManager() { return bucketManager.getBucketManager(HEROES); } public void close(@Disposes BucketManager bucketManager) { bucketManager.close(); }


The smoothest model in the NoSQL database, the key-value is key-based. In this example, we’re using the Redis implementation. The KeyValueTemplate has a template method pattern, so it has a skeleton to key-value operations.

public class KeyValueTemplateApp { public static void main(String[] args) throws InterruptedException { try (SeContainer container = SeContainerInitializer.newInstance().initialize()) { KeyValueTemplate template =; God diana = builder().withId("diana").withName("Diana").withPower("hunt").builder(); template.put(diana); Optional<God> result = template.get("diana", God.class); result.ifPresent(System.out::println); template.put(diana, Duration.ofSeconds(1)); Thread.sleep(2_000L); System.out.println(template.get("diana", God.class)); } }

Also, it has support for the Repository interface, nonetheless, with support to the method query once the query is key-based.

public class KeyValueRepositoryApp { public static void main(String[] args) throws InterruptedException { try (SeContainer container = SeContainerInitializer.newInstance().initialize()) { GodRepository repository =, DatabaseQualifier.ofKeyValue()).get(); God diana = builder().withId("diana").withName("Diana").withPower("hunt").builder();; Optional<God> result = repository.findById("diana"); result.ifPresent(System.out::println); } }


The column type also is key-based, despite the fact that there are implementations that enable searching from a different column, as Cassandra does when adding an index. For example, the key-value column has a template to operations, the ColumnTemplate.

public class ColumnTemplateApp { public static void main(String[] args) throws InterruptedException { try (SeContainer container = SeContainerInitializer.newInstance().initialize()) { ColumnTemplate template =; God diana = builder().withId("diana").withName("Diana").withPower("hunt").builder(); template.insert(diana); ColumnQuery query = select().from("god").where("_id").eq("diana").build(); List<God> result =; result.forEach(System.out::println); template.insert(diana, Duration.ofSeconds(1)); Thread.sleep(2_000L); System.out.println(; } }

Also, the repository:

public class ColumnRepositoryApp { public static void main(String[] args) { try (SeContainer container = SeContainerInitializer.newInstance().initialize()) { GodRepository repository =, DatabaseQualifier.ofColumn()).get(); God diana = builder().withId("diana").withName("Diana").withPower("hunt").builder();; Optional<God> result = repository.findById("diana"); result.ifPresent(System.out::println); } }


The document, in general, has a better approach to reading entities. It has better assistance to find fields than the key/ID. It has DocumentTemplate to do document operations, as the column API and the document query have a fluent API.

public class DocumentTemplateApp { public static void main(String[] args) throws InterruptedException { try (SeContainer container = SeContainerInitializer.newInstance().initialize()) { DocumentTemplate template =; God diana = builder().withId("diana").withName("Diana").withPower("hunt").builder(); template.insert(diana); DocumentQuery query = select().from("god").where("name").eq("Diana").build(); List<God> result =; result.forEach(System.out::println); template.insert(diana, Duration.ofSeconds(1)); Thread.sleep(2_000L); System.out.println(; } }

Another good point in the document is that it might find by ID without any secondary index.

public class DocumentRepositoryApp { public static void main(String[] args) throws InterruptedException { try (SeContainer container = SeContainerInitializer.newInstance().initialize()) { GodRepository repository =, DatabaseQualifier.ofDocument()).get(); God diana = builder().withId("diana").withName("Diana").withPower("hunt").builder();; Optional<God> result = repository.findById("diana"); result.ifPresent(System.out::println); } }


The graph is the type that allows more complex entities and includes the deepest relationships, like properties or directions. To represent this, it has a particular object, the edge, to make it happen. Against the other’s communications layer, Eclipse JNoSQL doesn’t create the graph communication API because that already exists in Apache TinkerPop. So, it provides a mapping API with a tight integration with this Apache framework:

public class GraphTemplateApp { public static void main(String[] args) { try (SeContainer container = SeContainerInitializer.newInstance().initialize()) { GraphTemplate template =; Graph graph =; God diana = builder().withId("diana").withName("Diana").withPower("hunt").builder(); template.insert(diana); graph.tx().commit(); Optional<God> result = template.getTraversalVertex().hasLabel(God.class).has("name", "Diana").next(); result.ifPresent(System.out::println); } }

The Graph API offers a repository to query using Gremlin.

public class GraphRepositoryApp { public static void main(String[] args) { try (SeContainer container = SeContainerInitializer.newInstance().initialize()) { GodRepository repository =, DatabaseQualifier.ofGraph()).get(); Graph graph =; God diana = builder().withName("Diana").withPower("hunt").builder();; graph.tx().commit(); Optional<God> result = repository.findByName("Diana"); result.ifPresent(System.out::println); } }

Original Link

This Week in Neo4j: JavaScript CRUD Apps, Personalized Recommendation Engines, Graph Theory Tutorial

Welcome to this week in Neo4j, where we round up what’s been happening in the world of graph databases in the last seven days.

This week, we’ve got real-time food and event recommendation engines, a JavaScript OGM, a Neo4j Operational Dashboard, and more!

Featured Community Member: Meredith Broussard

This week’s featured community member is Meredith Broussard, Assistant Professor at New York University, with a focus on data-driven reporting, computational journalism, and data visualization.

Meredith has presented Neo4j workshops at NICAR 2017, showing attendees how to find connections in campaign finance data, and again in 2018, this time with a focus on social network analysis.

On behalf of the Neo4j and data journalism communities, thanks for all your work Meredith!

Recommendation Engines for Food Recipes and Events

This week, we have two stories about real-time recommendation engines: a use case where graph databases excel.

Irene Iriarte Carretero, last week’s featured community member, was interviewed by diginomica after her GraphTour London talk last week.

Irene explains how Gousto is using Neo4j to build a personalized recipe recommendation engine that takes “the subjective aspect” of cooking into account.

Suprfanz’s Jennifer Webb presented Data science in practice: Examining events in social media at the Strata Data Conference in San Jose.

In the talk, Jennifer shows how to build a recommendation engine for event promoters, starting from the community graph and using graph algorithms to find influencers. You can download the slides from Jennifer’s talk.

Neo4j Operational Dashboard, JavaScript OGM, Graphs for Identity

Geek Out: Graph Theory Tutorial

I came across Michel Caradec’s excellent workshop about implementing graph theory with Neo4j.

Michel set himself the challenge of implementing graph theory concepts using pure Cypher, and in the tutorial, he shows how to create random graphs, extract subgraphs, generate adjacency matrices, and more.

If you geek out on graph theory, you’re going to love this tutorial.

Tweet of the Week

Image title

That’s all for this week!

Original Link

Graph Gopher: The Neo4j Browser Built on Swift for Your iOS Device

Graph Gopher is a Neo4j browser for iPhone that was recently released to the App Store.

Graph Gopher lets you interact natively with your Neo4j graphs through easy browsing and quick entry for new nodes and relationships. It gives you a full Cypher client at your fingertips and fast editing of your existing data.

Start by browsing labeled nodes or relationships or their property keys.A full Cypher client ready at your fingertips:Quickly add new nodes and relationships.Easily edit nodes and relationships.See what relationships share a common property key, in this case, createdDate

How Graph Gopher Got Started

Graph Gopher came out of a few questions I explored. First of all, I was exploring different ways to browse the graphs stored in my Neo4j graph database. The graph visualization of a Cypher query approach we know from the Neo4j web interface was an alternative, but I thought it required quite a bit of the user to start exploring it, and it was perhaps not as good a fit on a phone-sized device.

After spending a lot of time trying to adapt that, I found that the classic navigation interface was one I thought worked well for exploring the graph. To me, the navigation interface looks a lot like Gopher, the navigation paradigm we used to explore the internet before web browsers, and hence the name was born.

Building Graph Gopher in Swift

The second road to Graph Gopher was that Swift – a language used to write iOS apps – had become open source, and it was starting to be used to write server applications. While databases like MySQL and SQLite were available and used by many, Neo4j was absent.

I knew I could do something about that, and joined Cory Wiles’s Theo project in late 2016. After completing the Swift 3.0 transition together with him, I implemented Bolt support for 3.1 and 3.2.

For version 4.0, I improved the API, made it support Swift 4, and made it a lot easier to use. I used the development of Graph Gopher to validate the work done there, and Graph Gopher is a great demonstration of what you can do with Theo. Along the way, other developers started using the betas of Theo 4, giving me great feedback.

Faster Than the Neo4j Browser and Available Wherever You Need It

An ambition for Graph Gopher was to be way faster to load and use than loading up the web interface in a browser tab and interacting with your Neo4j instance that way. In practice it has been no match: it is a very convenient tool. Even though I use a Mac all through my working day, I still access my Neo4j instances primarily through Graph Gopher.

The exception to this is when I write longer Cypher statements as part of my development work, but I have gotten good feedback on how to improve this. Look forward to updates here in the coming versions.

In practice, Graph Gopher makes it so that you always have your Neo4j instance available to you. It helps you add or edit nodes and relationships, prototype ideas and look up queries from your couch, coming out of the shower, on the train, or wherever you are. That is wonderfully powerful.

Another important feature is multi-device support. I use both an iPhone and an iPad, and I know people will use it on both work and private devices. Therefore it was important to me that session configuration was effortlessly transferred between devices, as well as favorite nodes. This has been implemented using iCloud so that if you add a new instance configuration on one device, it will be available to all devices using the same iCloud account.

Unique to mobile devices is connectivity, and a lot of work was done to help Graph Gopher keep a stable connection over flaky network connections. If the connection still drops, it will reconnect to allow you to continue working where you left off.

The Future of Graph Gopher

The road forward with Graph Gopher will be exciting. Now that it is out, I get contacted by people in situations I hadn’t imagined at all. Where people use it will be the primary driver of what features get added and how it will evolve. I would absolutely love to hear back from you how you use it, or how you would like to use it.

Original Link

Neo4j: A Reasonable RDF Graph Database and Reasoning Engine

It is widely known that Neo4j is able to load and write RDF. Until now, RDF and OWL reasoning have been attributed to fully fledged triple stores or dedicated reasoning engines only. This post shows that Neo4j can be extended by a unique reasoning technology to deliver a very expressive and highly competitive reasoning engine for RDF, RDFS, and OWL 2 RL. I will briefly illustrate the approach and provide some benchmark results.

Labeled property graphs (LPG) and the resource description framework (RDF) have a common ground: both consider data as a graph. Not surprisingly, there are ways of converting one format into the other, as recently demonstrated nicely by Jesús Barrasa from Neo4j for the Thomson Reuters PermID RDF dataset.

If you insist on differences between LPG and RDF, then consider the varying abilities to represent schema information and reasoning.

In Neo4j 2.0, node labels were introduced for typing nodes to optionally encode a lightweight type schema for a graph. Broadly speaking, RDF Schema (RDFS) extends this approach more formally. RDFS allows structuring labels of nodes (called classes in RDF) and relationships (called properties) in hierarchies. On top of this, the Web Ontology Language (OWL) provides a language to express rule-like conditions to automatically derive new facts such as node labels or relationships.

Reasoning Enriches Data With Knowledge

For a quick dive into the world of rules and OWL reasoning, let’s consider the very popular LUBM benchmark (Lehigh University Benchmark).

The benchmark consists of artificially generated graph data in a fictional university domain and deals with people, departments, courses, etc. As an example, a student is derived to be an attendee if he or she takes some course. Thus, when he or she matches the following ontological rule:

Student and (takesCourse some) SubClassOf Attendee

This rule has to be read as follows when translated into LPG lingo: every node with label Student that has some relationship with label takesCourse to some other node will receive the label Attendee. Any experienced Neo4j programmer may rub his or her hands since this rule can be translated straightforwardly into the following Cypher expression:

match (x:Student)-[:takesCourse]->()
set x:Attendee

That is perfectly possible but could become cumbersome in case of deeply nested rules that may also depend on each other. For instance, the Cypher expression misses the subclasses of Student such as  UndergraduateStudent. Strictly speaking, the expression above should therefore read:

match (x)-[:takesCourse]->() where x:Student or x:UndergraduateStudent
set x:Attendee

It’s obviously more convenient to encode such domain knowledge as an ontological rule with the support of an ontology editor such as Protégé and an OWL reasoning engine that takes care of executing them.

Another nice thing about RDFS/OWL is that modeling such knowledge is on a very declarative level that is standardized by W3C. In addition, the OWL language bears some important properties such as soundness and completeness.

For instance, you can never define a non-terminating rule set, and reasoning will instantly identify any conflicting rules. In case of OWL 2 RL, it is furthermore guaranteed that all derivable facts can be derived in polynomial time (theoretical worst case) with respect to the size of the graph.

In practice, performance can vary a lot of course. In case of our Attendee example, a reasoner — regardless of whether a triple store rule engine or Cypher engine — has to loop over the graph nodes with label Student  and check for takesCourse relations.

To tweak performance, one could use dedicated indexes to effectively select nodes with particular relations (resp. relation degree) or labels, as well as use stored procedures. At the end of the day, it seems that this does not scale well: when doubling the data, you double the number of graph reads and writes to compute the consequences of such rules.

The good news is that this is not the end of the story.

Efficient Reasoning for Graph Storage

There is a technology called GraphScale that empowers Neo4j with scalable OWL reasoning. The approach is based on an abstraction refinement technique that builds a compact representation of the graph suitable for in-memory reasoning. Reasoning consequences are then incrementally propagated back to the underlying graph store.

The idea behind GraphScale is based on the observation that entities within a graph often have a similar structure. The GraphScale approach takes advantage of these similarities and computes a condensed version of the original data called an abstraction.

This abstraction is based on equivalence groups of nodes that share a similar structure according to well-defined logical criteria. This technique is proven to be sound and complete for all of RDF, RDFS, and OWL 2 RL.

Learn how the Neo4j graph database (vs. a triple store) performs as a reasonable RDF reasoning engine

Here is an intuitive idea of the approach. Consider the graph above as a fraction of the original data about the university domain in Neo4j. On the right, there is a compact representation of the undergraduate students that take at least some course.

In essence, the derived fact that those students are attendees implicitly holds for all source nodes in the original graph. In other words, there is some one-to-many relationship from derived facts in the compact representation to nodes in the original graph.

Reasoning and Querying Neo4j With GraphScale

Let’s look at some performance results with data of increasing size from the LUBM test suite.

The following chart depicts the time to derive all derivable facts (called materialization) with GraphScale on top of Neo4j (without loading times) with 50, 100, resp. 250 universities. In comparison to other secondary storage systems with reasoning capabilities, it occurs that the Neo4j-GraphScale duo shows a much lower growth ratio in reasoning time with increasing data than any other system (schema and data files can be found at the bottom of this post).

A benchmark of GraphScale + Neo4j using the LUBM test suite

Experience has shown that materialization is key to efficient querying in a real-world setting. Without upfront materialization, a reasoning-aware triple store has to temporarily derive all answers and relevant facts for every single query on demand. Consequently, this comes with a performance penalty and typically fails on non-trivial rule sets.

Since the Neo4j graph database is not a triple store, it is not equipped with a SPARQL query engine. However, Neo4j offers Cypher and for many semantic applications, it should be possible to translate SPARQL to Cypher queries.

From a user perspective, this integrates two technologies into one platform: a transactional graph analytics system as well as an RDFS/OWL reasoning engine able to service sophisticated semantic applications via Cypher over a materialized graph in Neo4j.

As a proof of concept, let’s consider SPARQL query number nine from the LUBM test suite that turned out to be one of the most challenging out of the 14 given queries. The query asks for students and their advisors which teach courses taken by those students: a triangular relationship pattern over most of the dataset:

SELECT ?X ?Y ?Z { ?X rdf:type Student . ?Y rdf:type Faculty . ?Z rdf:type Course . ?X advisor ?Y . ?Y teacherOf ?Z . ?X takesCourse ?Z

Under the assumption of a fully materialized graph, this SPARQL query translates into the following Cypher query:

MATCH (x:Student)-[:takesCourse]->(z:Course), (x)-[:advisor]->(y:Faculty)-[:teacherOf]->(z)
RETURN x, y, z

Without a doubt, the Neo4j Cypher engine delivers a competitive query performance with the previous datasets (times for resp. count(*) version of query nine). Triple store A is not listed since it is a pure in-memory system without secondary storage persistence.

Benchmark data between Neo4j + Cypher + GraphScale vs. a triple storeThere is more potential in the marriage of Neo4j and the GraphScale technology. In fact, the graph abstraction can be very helpful as an index for query answering. For instance, you can instantly read from the abstraction whether there are some data matching query patterns of kind (x:)-[:]->().

Bottom line: I fully agree with George Anadiotis’ statement that labeled property graphs and RDF/OWL are close relatives.

In a follow-up blog post, I will present an interactive visual exploration and querying tool for RDF graphs that utilizes the compact representation described above as an index to deliver a distinguished user experience and performance on large graphs.



  • GraphScale: Adding Expressive Reasoning to Semantic Data Stores. Demo Proceedings of the 14th International Semantic Web Conference (ISWC 2015):
  • Abstraction refinement for scalable type reasoning in ontology-based data repositories: EP 2 966 600 A1 & US 2016/0004965 A1


Original Link

Mixing Specified and Unspecified Group Belongings in a Single Import Isn’t Supported

I’ve been working with the Neo4j Import Tool recently after a bit of a break and ran into an interesting error message that I initially didn’t understand.

I had some CSV files containing nodes that I wanted to import into Neo4j. Their contents look like this:

$ cat people_header.csv name:ID(Person) $ cat people.csv "Mark" "Michael" "Ryan" "Will" "Jennifer" "Karin" $ cat companies_header.csv name:ID(Company) $ cat companies.csv "Neo4j"

I find it easier to use separate header files because I often make typos with my column names and it’s easier to update a single line file than to open a multi-million line file and change the first line.

I ran the following command to create a new Neo4j database from these files:

$ ./bin/neo4j-admin import \
--database=blog.db \
--mode=csv \
--nodes:Person people_header.csv,people.csv \
--nodes:Company companies_heade.csv,companies.csv

Which resulted in this error message:

Neo4j version: 3.3.3
Importing the contents of these files into /Users/markneedham/Library/Application Support/Neo4j Desktop/Application/neo4jDatabases/database-b59e33d5-2060-4a5d-bdb8-0b9f6dc919fa/installation-3.3.3/data/databases/blog.db:
Nodes: :Person /Users/markneedham/Library/Application Support/Neo4j Desktop/Application/neo4jDatabases/database-b59e33d5-2060-4a5d-bdb8-0b9f6dc919fa/installation-3.3.3/people_header.csv /Users/markneedham/Library/Application Support/Neo4j Desktop/Application/neo4jDatabases/database-b59e33d5-2060-4a5d-bdb8-0b9f6dc919fa/installation-3.3.3/people.csv :Company /Users/markneedham/Library/Application Support/Neo4j Desktop/Application/neo4jDatabases/database-b59e33d5-2060-4a5d-bdb8-0b9f6dc919fa/installation-3.3.3/companies.csv ... Import error: Mixing specified and unspecified group belongings in a single import isn't supported
Caused by:Mixing specified and unspecified group belongings in a single import isn't supported
java.lang.IllegalStateException: Mixing specified and unspecified group belongings in a single import isn't supported
at org.neo4j.unsafe.impl.batchimport.input.Groups.getOrCreate(
at org.neo4j.unsafe.impl.batchimport.input.csv.InputNodeDeserialization.initialize(
at org.neo4j.unsafe.impl.batchimport.input.csv.InputEntityDeserializer.initialize(
at org.neo4j.unsafe.impl.batchimport.input.csv.ParallelInputEntityDeserializer.lambda$new$0(
at org.neo4j.unsafe.impl.batchimport.staging.TicketedProcessing.lambda$submit$1(
at org.neo4j.unsafe.impl.batchimport.executor.DynamicTaskExecutor$

The output actually helpfully indicates which files it’s importing from and we can see under the :Company section that the header file is missing.

As a result of the typo Ithat  made when trying to type companies_header.csv, the tool now treats the first line of companies.csv as the header and since we haven’t specified a group (i.e. Company, Person) on that line we receive this error.

Let’s fix the typo and try again:

$ ./bin/neo4j-admin import \
--database=blog.db \
--mode=csv \
--nodes:Person people_header.csv,people.csv \
--nodes:Company companies_header.csv,companies.csv Neo4j version: 3.3.3
Importing the contents of these files into /Users/markneedham/Library/Application Support/Neo4j Desktop/Application/neo4jDatabases/database-b59e33d5-2060-4a5d-bdb8-0b9f6dc919fa/installation-3.3.3/data/databases/blog.db:
Nodes: :Person /Users/markneedham/Library/Application Support/Neo4j Desktop/Application/neo4jDatabases/database-b59e33d5-2060-4a5d-bdb8-0b9f6dc919fa/installation-3.3.3/people_header.csv /Users/markneedham/Library/Application Support/Neo4j Desktop/Application/neo4jDatabases/database-b59e33d5-2060-4a5d-bdb8-0b9f6dc919fa/installation-3.3.3/people.csv :Company /Users/markneedham/Library/Application Support/Neo4j Desktop/Application/neo4jDatabases/database-b59e33d5-2060-4a5d-bdb8-0b9f6dc919fa/installation-3.3.3/companies_header.csv /Users/markneedham/Library/Application Support/Neo4j Desktop/Application/neo4jDatabases/database-b59e33d5-2060-4a5d-bdb8-0b9f6dc919fa/installation-3.3.3/companies.csv ... IMPORT DONE in 1s 5ms. Imported: 7 nodes 0 relationships 7 properties
Peak memory usage: 480.00 MB


Original Link

What’s Waiting for You in the Latest Release of the APOC Library [March 2018]

The last release of APOC library was just before GraphConnect New York, and in the meantime, quite a lot of new features made their way into our little standard library.

We also crossed 500 GitHub stars, thanks everyone for giving us a nod!

What’s New in the Latest APOC Release

Image: Warner Bros.

If you haven’t used APOC yet, you have one less excuse: it just became much easier to try. In Neo4j Desktop, just navigate to the Plugins tab of your Manage Database view, and click Install for APOC. Then your database is restarted, and you’re ready to rock.

APOC wouldn’t be where it is today without the countless people contributing, reporting ideas and issues and everyone telling their friends. Please keep up the good work.

I also added a code of conduct and contribution guidelines to APOC, so every contributor feels welcome and safe and also quickly knows how to join our efforts.

For this release again, our friends at LARUS BA did a lot of the work. Besides many bug fixes, Angelo Busato also added S3 URL support, which is really cool. Andrea Santurbano also worked on the HDFS support (read/write).

With these, you can use S3 and HDFS URLs in every procedure that loads data, like apoc.load.json/csv/xml/graphml, apoc.cypher.runFile, etc. Writing to HDFS is possible with all the export functions, like apoc.export.cypher/csv/graphml.

Andrew Bowman worked on a number of improvements around path expanders, including:

  • Added support for repeating sequences of labels and/or rel-types to express more complex paths.
  • Support for known end nodes (instead of end nodes based only on labels).
  • Support for compound labels (such as :Person:Manager).

I also found some time to code and added a bunch of things. 

Aggregation Functions

I wanted to add aggregation functions all the way back to Neo4j 3.2 after Pontus added the capability, but I just never got around to it. Below is one of the patterns that we used to use to get the first (few) elements of a collect, which is quite inefficient because the full collect list is built up even if you’re just interested in the first element:

MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WITH p,m ORDER BY m.released
RETURN p, collect(m)[0] as firstMovie

Now, you can just use:

MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WITH p,m ORDER BY m.released
RETURN p, apoc.agg.first(m) as firstMovie

There are also some more statistics functions, including apoc.agg.statistics, which computes all at once and returns a map with {min,max,sum,median,avg,stdev}. The other statistics functions include:

  • More efficient variants of collect(x)[a..b]
  • apoc.agg.nth, apoc.agg.first, apoc.agg.last, apoc.agg.slice
  • apoc.agg.median(x)
  • apoc.agg.percentiles(x,[0.5,0.9])
  • apoc.agg.product(x)
  • apoc.agg.statistics() provides a full numeric statistic


Implemented an idea of my colleague Ryan Boyd to allow indexing of full “documents,” i.e. map-structures per node or relationship that can also contain information from the neighborhood or computed data. Later, those can be searched as keys and values of the indexed data.

MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
WITH p, p {.name, .age, roles:r.roles, movies collect(m.title) } as doc
CALL apoc.index.addNodeMap(p, doc);

Then, later you can search:

CALL apoc.index.nodes('Person','name:K* movies:Matrix roles:Neo');
apoc.index.addNodeMap(node, {map})
apoc.index.addRelationshipMap(node, {map})

As part of that work, I also wanted to add support for deconstructing complex values or structs, such as:

  • to select the values of a subset of keys into a mixed type list.
  • apoc.coll.elements is used to deconstruct a sublist into typed variables (this can also be done with WITH, but requires an extra declaration of the list to be concise).
RETURN{a:'foo', b:42, c:true}, ["a","c"]) -> ['foo', true] CALL apoc.coll.elements([42, 'foo', person]) YIELD _1i as answer, _2s as name, _3n as person

Path Expander Sequences

You can now define repeating sequences of node labels or relationship types during expansion. Just use commas in the relationshipFilter and labelFilter config parameters to separate the filters that should apply for each step in the sequence.


The above will continue traversing only the given sequence of relationships.

labelFilter:'Person|Investor|-Cleared, Company|>Bank|/Government:Company'

All filter types are allowed in label sequences. The above repeats a sequence of a :Person or :Investor node (but not with a :Cleared label), and then a :Company, :Bank, or :Government:Company node (where :Bank nodes will act as end nodes of an expansion, and :Government:Company nodes will act as end nodes and terminate further expansion).

sequence:'Person|Investor|-Cleared, OWNS_STOCK_IN>, Company|>Bank|/Government:Company, <MANAGES, LIVES_WITH>|MARRIED_TO>|RELATED'

The new sequence config parameter above lets you define both the label filters and relationship filters to use for the repeating sequence (and ignores labelFilter and relationshipFilter if present).

Path Expansion Improvements

  • Compound labels (like Person:Manager) allowed in the label filter, applying only to nodes with all of the given labels.
  • endNodes and terminatorNodes config parameters for supplying a list of the actual nodes that should end each path during expansion (terminatorNodes end further expansion down the path, endNodes allow expansion to continue)
  • For labelFilter, the whitelist symbol + is now optional. Lack of a symbol is interpreted as a whitelisted label.
  • Some minor behavioral changes to the end node > and termination node / filters, specifically when it comes to whitelisting and behavior when below minLevel depth.

Path Functions

(This one came from a request in

  • apoc.path.create(startNode, [rels])
  • apoc.path.slice(path, offset, length)
  • apoc.path.combine(path1, path2)
MATCH (a:Person)-[r:ACTED_IN]->(m)
MATCH (m)<-[d:DIRECTED]-()
RETURN apoc.path.create(a, r, d) as path MATCH path = (a:Roo)<-[:PARENT_OF*..10]-(leaf)
RETURN apoc.path.slice(path, 2,5) as subPath MATCH firstLeg = shortestPath((start:City)-[:ROAD*..10]-(stop)), secondLeg = shortestPath((stop)-[:ROAD*..10]->(end:City))
RETURN apoc.path.combine(firstLeg, secondLeg) as route

Text Functions

  • apoc.text.code(codepoint), apoc.text.hexCharAt(), apoc.text.charAt() (thanks to Andrew Bowman)
  • apoc.text.bytes/apoc.text.byteCount (thanks to Jonatan for the idea)
  • apoc.text.toCypher(value, {}) for generating valid Cypher representations of nodes, relationships, paths, and values
  • Sørensen-Dice similarity (thanks, Florent Biville)
  • Roman Arabic conversions (thanks, Marcin Cylke)
  • New email and domain extraction functions (thanks, David Allen)

Data Integration

  • Generic XML import with apoc.import.xml() (thanks, Stefan Armbruster)
  • Pass Cypher parameters to apoc.export.csv.query
  • MongoDB integration (thanks, Gleb Belokrys)
  • stream apoc.export.cypher script export back to the client when no file name is given
  • apoc.load.csv
    • Handling of converted null values and/or null columns
    • Explicit nullValues option to define values that will be replaced by null (global and per field)
    • Explicit results option to determine which output columns are provided

Collection Functions

  • apoc.coll.combinations(), apoc.coll.frequencies() (thanks, Andrew)
  • Update/remove/insert value at collection index (thanks, Brad Nussbaum)

Graph Refactoring

  • Per property configurable merge strategy for mergeNodes
  • Means to skip properties for cloneNodes

Other Additions

Other bug fixes in this release of the APOC library include:

  • apoc.load.jdbc (type conversion, connection handling, logging)
  • apoc.refactor.mergeNodes
  • Composite indexes in Cypher export
  • ElasticSearch integration for ES 6
  • Made larger parts of APOC not require the unrestricted configuration
  • apoc.json.toTree (also config for relationship-name casing)
  • Warmup improvements (dynamic properties, rel-group)
  • Compound index using apoc.schema.assert (thanks, Chris Skardon)
  • Explicit index reads don’t require read-write-user
  • Enable parsing of lists in GraphML import (thanks, Alex Wilson)
  • Change CYPHER_SHELL format from upper case to lower case. (:begin,:commit)
  • Allowed to use untyped directions (thanks, Andrew)


As always, we’re very interested in your feedback, so please try out the new APOC releases, and let us know if you like them and if there are any issues.

Please refer to the documentation or ask in neo4j-users Slack in the #neo4j-apoc channel if you have any questions.

Enjoy the new release(s)!

Original Link

DevOps on Graphs: The 5-Minute Interview With Ashley Sun, Software Engineer at LendingClub [Video]

“Basically, anything you can think of in your infrastructure, whether it’s GitHub, Jenkins, AWS, load balancers, Cisco UCS, vCenter – it’s all in our graph database,” said  Ashley Sun, Software Engineer at  LendingClub.

DevOps at LendingClub is no easy feat: Due to the complexities and dependencies of their internal technology infrastructure – including a host of microservices and other applications – it would be easy for everything to spiral out of control. However, graph technology helps them manage and automate every connection and dependency from top to bottom. 

In this week’s five-minute interview (conducted at GraphConnect New York), Ashley Sun discusses how the team at LendingClub uses Neo4j to gain complete visibility into its infrastructure for deployment and release automation and cloud orchestration. The flexibility of the schema makes it easy for LendingClub to add and modify its view so that their graph database is the single up-to-date source for all queries about its release infrastructure.

Talk to us about how you use Neo4j at LendingClub.

Ashley Sun: We are using Neo4j for everything related to managing the complexities of our infrastructure. We are basically scanning all of our infrastructure and loading it all into Neo4j. We’ve written a lot of deployment and release automation, cloud orchestration, and it’s all built around Neo4j. Basically, anything you can think of in your infrastructure, whether it’s GitHub, Jenkins, Amazon Web Services (AWS), load balancers, Cisco Unified Computing System (UCS), vCenter – it’s all in our graph database

We’re constantly scanning and refreshing this information so that at any given time, we can query our graph database and receive real-time, current information on the state of our infrastructure.

What made you choose Neo4j?

Sun: At the time, my manager was looking for a database that we could run ad-hoc queries against, something that was flexible and scalable. He actually looked at a few different graph databases and decided Neo4j was the best. 

Catch this week’s 5-Minute Interview with Ashley Sun, Software Engineer at LendingClub

What have been some of your most interesting or surprising results you’d seen while using Neo4j?

Sun: The coolest thing about Neo4j, for us, has been how flexible and easily scalable it is. If you’ve come from a background of working with the traditional SQL database where schemas have to be predefined — with Neo4j, it’s really easy to build on top of already existing nodes, already existing relationships and already existing properties. It’s really easy to modify things. Also, it’s really, really easy to query at any time using ad-hoc queries. 

We’ve been working with Neo4j for three years, and as our infrastructure has grown and as we’ve added new tools, our graph database has scaled and grown with us and just evolved with us really easily. 

Anything else you’d like to add or say?

Sun: It would be exciting for more tech companies to start using Neo4j to map out their infrastructure and maybe automate deployments and their cloud orchestration using Neo4j. I’d love to about how other tech companies are using Neo4j.

Original Link

  • 1
  • 2