ALU

graph database

Graph Algorithms in Neo4j: The Neo4j Graph Algorithms Library

The Neo4j Graph Algorithms Library is used on your connected data to gain new insights more easily within Neo4j. These graph algorithms improve results from your graph data, for example by focusing on particular communities or favoring popular entities.

This article series is designed to help you better leverage graph analytics so you can effectively innovate and develop intelligent solutions faster.

Original Link

Graph Algorithms in Neo4j: Neo4j Graph Analytics

At a fundamental level, a native graph platform is required to make it easy to express relationships across many types of data elements. To succeed with connected data applications, you need to traverse these connections at speed, regardless of how many hops your query takes.

This series is designed to help you better leverage graph analytics so you can effectively innovate and develop intelligent solutions faster.

Original Link

Building a Graph Database on a Key-Value Store?

[Excerpted from the eBook Native Parallel Graphs: The Next Generation of Graph Database for Real-Time Deep Link Analytics]

Until recently, graph database designs fulfilled some but not all of the graph analytics needs of enterprises. The first generation of graph databases (e.g., Neo4j) was not designed for big data. They cannot scale out to a distributed database, are not designed for parallelism, and perform slowly at both data loading and querying for large datasets and/or multi-hop queries.

Original Link

Understanding Graph Databases

What Enterprise Architects Should Know

The description of graph databases that you get when you Google it is mostly academic. I see a lot of descriptions about graph databases that talk about seven bridges in Königsberg or Berners-Lee, the inventor of the internet. There are theories and visions which are fine, but for me, I still think it’s important to lead with the relevance. Why are graph databases important to you?

Imagine the data that’s stored in a local restaurant chain. If you were keeping track, you’d store customer information in one database table, the items you offer in another, and the sales that you’ve made in a third table. This is fine when I want to understand what I sold, order inventory, and know who my best customer is. But what’s missing is the connective tissue and the connection between the items along with functions in the database that can allow me to make the most of it.

Original Link

The Rise of Graph Databases [Video]

A few months ago, we attended Data Summit 2018 here in Boston. There were a number of topics discussed at this 3-day conference surrounding big data technologies, including AI, ML, graph technology and moving to the cloud (for a good overview of the topics discussed, read Joyce Wells and Stephanie Simone of DBTA’s 12 Key Takeaways About Data and Analytics from Data Summit 2018). One of the more popular presentations at the Data Summit was one given by our CTO, Sean Martin, titled "The Rise of Graph Databases". Here is the video of his presentation.

Original Link

The Year of the Graph [Slides]

Graph technology has truly burst onto the scene with diverse new products and services, proving that graph is relevant and that not all graph use cases are equal. Previously relegated to niche implementations and science projects, graph now finds itself deployed as the foundational technology for enterprise analytics solutions and enterprise Information Fabric strategies. It is no surprise that many are calling 2018 “The Year of the Graph”.

In his presentation to a packed house at Gartner Data & Analytics Summit 2018, Cambridge Semantics’ Ben Szekely, VP of Solution Engineering, discussed why 2018 is the "Year of the Graph" and how Anzo utilizes graph database technology to provide a semantic layer for your data lake. Here are the slides from his presentation. 

Original Link

#GraphCast: Graph Karaoke Featuring The Knife’s ”Heartbeat” [Video]

Welcome to our new biweekly Sunday series, #GraphCast, which aims to unearth digestible, notable, and just plain fun Neo4j YouTube videos (of which there are a lot).

Whether we focus on some of our most popular videos or highlight a particularly solid educational piece on graph technology that may’ve slipped past you, #GraphCast is meant to be short, sweet and the perfect companion piece to your Sunday morning bowl of cereal (or two, if you’re hungry, we don’t judge).

Original Link

New Features, Now: 5-Minute Interview

"One of the most surprising things I’ve seen with Neo4j is the speed at which we’re able to innovate and deliver features to our customers," said Mark Hashimoto, Senior Director of Engineering, Digital Home at Comcast.

In this week’s five-minute interview, we discuss how Comcast uses the flexibility of the graph data model to develop and launch new features rapidly using Neo4j for persistence.

Original Link

Graphs in RavenDB: Real World Use Cases

imageI talked a lot about how graph queries in RavenDB will work, but one missing piece of the puzzle is how they are going to be used. I’m going to use this post to discuss some of the options that the new features enable. We’ll take the following simple model, issue tracking. Let’s imagine that a big (secret) project is going on and we need to track down the tasks for it. On the right, you have an image of the permissions graph that we are interested in.

The permission model is pretty standard I think. We have the notion of users and groups. A user can be associated to one or more groups. Groups memberships are hierarchical. An issue’s access is controlled either by giving access to a specific user or to a group. The act of assigning a group will also allow access to all the group’s parents and any user that is associated to any of them.

Original Link

How to Know What You Know: 5-Minute Interview

"I want to know what I know. That describes what knowledge graphs do for companies," said Dr. Alessandro Negro, Chief Scientist at GraphAware.

In this week’s five-minute interview, we discuss how GraphAware uses natural language processing to help companies gain a better understanding of the knowledge that is spread across their organization.

Original Link

Graphs in RavenDB: Recursive Queries

imageGraph queries, as I discussed them so far, give you the ability to search for patterns. Above, you can see the family tree of the royal family of Great Britain going back a few hundred years. That makes for an interesting subject for practicing graph queries.

A good example we might want to ask is who is the royal grandparent of Elizabeth II. We can do that using:

Original Link

Graphs in RavenDB: Inconsistency Abhorrence

In my previous post, I discussed some options for changing the syntax of graph queries in RavenDB from Cypher to be more in line with the rest of the RavenDB Query Language. We have now completed that part and can see the real impact it has on the overall design.

In one of the design review, one of the devs (who have built non-trivial applications using Neo4J) complained that the syntax is now much longer. Here are the before and after queries to compare:

Original Link

Graphs in RavenDB: Selecting the Syntax

When we started building support for graph queries inside RavenDB, we looked at what is the state of the market in this regard. There seem to be two major options: Cypher and Gremlins. Gremlins is basically a fluent interface that represents a specific graph pattern while Cypher is a more abstract manner to represent the graph query. I don’t like Gremlins, and it doesn’t fit into the model we have for RQL, so we went for the Cypher syntax. Note the distinction between went for Cypher and went for Cypher syntax.

One of the major requirements that we have is fitting it into the pre-existing Raven Query Language, but the first concern we had was just getting started and getting some idea about our actual scenarios. We are now at the point where we have written a bunch of graph queries and got a lot more experience in how it meshes into the overall environment. And at this point, I can really feel that there is an issue in meshing Cypher syntax into RQL. They don’t feel the same at all. There are a lot of good ideas there, make no mistake, but we want to create something that would flow as a cohesive whole.

Original Link

Building Enterprise Performance Into a Graph Database

This article is featured in the new DZone Guide to Databases: Relational and Beyond. Get your free copy for more insightful articles, industry statistics, and more!

This article introduces some of the key concepts of graph databases and their use cases. It explains how a focus on speed and scalability allows a modern “second-generation” graph database to tackle enterprise-level use cases to enable delivery of high-performance machine learning.

Original Link

Graphs in RavenDB: What’s the Role of the Middle Man?

imageAn interesting challenge with implementing graph queries is that you sometimes get into situations where the correct behavior is counter-intuitive.

Consider the case of the graph on the right and the following query:

Original Link

Graphs in RavenDB: Query Results

We run into an interesting design issue when building graph queries for RavenDB. The problem statement is fairly easy. Should a document be allowed to be bound to multiple aliases in the query results, or just one? However, without context, the problem statement in not meaningful, so let’s talk about what the actual problem is. Consider the graph on the right. We have three documents, Arava, Oscar and Phoebe, and the following edges:

  • Arava Likes Oscar
  • Phoebe Likes Oscar

Image title

Original Link

Effective Internal Risk Models for FRTB Compliance: Modern Graph Technology

Relational database technology can’t handle what is coming in banking and risk modeling. By the 2020s, Accenture predicts current banking business models will be swept away by a tide of ever-evolving technology and other rapidly occurring changes.

The right foundation for building compliance solutions is graph database technology. Neo4j answers the demands of Fundamental Review of the Trading Book (FRTB) regulations while building a foundation for future investment and risk compliance applications. Neo4j is the world’s leading graph database platform and the ideal solution for tracking investment data lineage.

Original Link

Half-Terabyte Benchmark Neo4j vs. TigerGraph

Graph database having been becoming more and more popular and are getting lots of attention.

In order to know how graph databases perform, I researched the state-of-the-art benchmarks and found that loading speed, loaded data storage, query performance, and scalability are the common benchmark features. However, those benchmarks’ testing datasets are too small, ranging from 4MB to 30 GB. So, I decided to do my own benchmark. Let’s play with a huge dataset: half-terabytes.

Original Link

Exploring DGraph: Getting Started

This article is primarily aimed towards getting DGraph up and running on your local machine, create some schema in DGraph (alter), add/delete data from DGraph (mutation) and get the data from it (query). We are going to keep the article simple, exploring each of the mentioned operations in detail in later blogs.

Installing and Running DGraph on Your Local Machine

There are multiple ways to install and run DGraph on your local machine, but the simplest one according to me is just downloading their docker-compose.yml file from their docs itself and running it with docker-compose. Or, you can copy and paste the below content in a file and name it as docker-compose.yml:

Original Link

Graphs4Good: Connected Data for a Better World

You’re reading this because of a napkin.

It was the year 2000, and I was on a flight to Mumbai. Peter, Johan, and I had been building an enterprise content management system (ECM) but kept running up against the challenge of using an RDBMS for querying connected data.

Original Link

Fighting Money Laundering and Corruption With Graph Technology

The shocking revelations of the International Consortium of Investigative Journalists (ICIJ), who released both the Panama and Paradise Papers, as well as the West Africa Leaks, have shown that aggressive tax avoidance and money laundering are a widespread and worldwide problem.

Money laundering often correlates with other illegal activities such as terrorist financing and corruption in politics and businesses, while tax avoidance leads to political and social tensions.

Original Link

Amazon Neptune, The Truth Revealed

In May this year, Amazon announced the General Availability of its cloud graph database service called Amazon Neptune. Here is a comprehensive blog that summarizes its strengths and weaknesses. Last year, we published a benchmark report on Neo4j, Titan, and TigerGraph. It is intriguing for us to find out how does Amazon Neptune perform relative to the other three graph databases? To answer this question, we conducted the same benchmark on Amazon Neptune. This article presents the discoveries revealed by our benchmark.

Executive Summary

Amazon Neptune provides both RDF and Property Graph models. For this benchmark, we focused on the Property Graph model, which uses Gremlin as its query interface.

Original Link

What Are the Criteria to Differentiate Between Graph Databases?

Graph databases have gotten much attention due to their advantages over relational models (see discussions here). However, while different technical companies rush into this area, Amazon, Microsoft, Oracle, IBM, etc., it is getting more challenging to evaluate different vendor’s product when a project wants to embark a graph database.

In this post, I’m going to share my insight to evaluate a graph database gained from benchmarking different graph databases. You can download the benchmark report here.

Original Link

Building a Dating Site With Neo4j (Part 2)

We came up with an idea for a dating site and an initial model in Part One. Next, we are going to work on a back end HTTP API, because I’m old school and that’s the way I like it. We will build our HTTP API right into Neo4j using an extension which turns Neo4j from a Server into a Service. Unlike last time where we wrote a clone of Twitter, I don’t really know where I’m going with this, so let’s start with some of the obvious API endpoints and then we can design and build more as we go along. Is this Agile or am I just being an idiot? I can’t tell, so onward we go.

First obvious thing is, we need a schema. Luckily Neo4j is a “Schema Optional” database so we don’t have to worry about designing any tables or properties or figuring out what kind of properties each table will have. Because… well, we don’t have tables. The only real schema we need to worry about are Constraints and Indexes. For example, we don’t want two users to have the same username or same email, so we will create a uniqueness constraint on those Label-Property combinations. We also want our users to pick Attributes they have and Attributes they want in a potential mate. To keep things keep clean and help the matching, we will seed the database with some Attributes and not let the users create them dynamically. However they need to be able to find and search for these Attributes, so we will index their names. Well, we will index a lowercase version of their names since the current Neo4j schema indexes are CaSe SeNsItIve. So our schema endpoint could start like this:

Original Link

Building a Dating Site With Neo4j (Part 1)

You might have already heard that Facebook is getting into the Dating business. Other dating sites have been using graphs in the past and we’ve looked at finding love using the graph before. It has been a while though, so let’s return to the topic making use of the new Date and Geospatial capabilities of Neo4j 3.4. I have to warn you though that I’ve been with Helene for almost 15 years and missed out on all this dating site fun, what I do know I blame Colin for it and some pointers from the comments section of this blog post.

Dating sites face a series of challenges, the first one is lack of users. Only two ways to fix that, the first one involves having lots of money to pay for national advertisements, the second involves word of mouth. So you dear reader have to either invest a few million dollars or join our new dating site and tell all your friends about it.

Original Link

Neo4j Launches Commercial Kubernetes Application on Google Cloud Platform Marketplace

On behalf of the Neo4j team, I am happy to announce that today we are introducing the availability of the Neo4j Graph Platform within a commercial Kubernetes application to all users of the Google Cloud Platform Marketplace.

This new offering provides customers with the ability to easily deploy Neo4j’s native graph database capabilities for Kubernetes directly into their GKE-hosted Kubernetes cluster.

The Neo4j Kubernetes application will be “Bring Your Own License” (BYOL). If you have a valid Neo4j Enterprise Edition license (including startup program licenses), the Neo4j application will be available to you.

Commercial Kubernetes applications can be deployed on-premise or even on other public clouds through the Google Cloud Platform Marketplace.

What This Means for Kubernetes Users

We’ve seen the Kubernetes user base growing substantially, and this application makes it easy for that community to launch Neo4j and take advantage of graph technology alongside any other workload they may use with Kubernetes.

Kubernetes customers are already building some of these same applications, and using Neo4j on Kubernetes, a user combines the graph capabilities of Neo4j alongside an existing application, such as an application that is generating recommendations by looking at the behavior of similar buyers, or a 360-degree customer view that uses a knowledge graph to help spot trends and opportunities.

GCP Marketplace + Neo4j

GCP Marketplace is based on a multi-cloud and hybrid-first philosophy, focused on giving Google Cloud partners and enterprise customers flexibility without lock-in. It also helps customers innovate by easily adopting new technologies from ISV partners, such as commercial Kubernetes applications, and allows companies to oversee the full lifecycle of a solution, from discovery through management.

As the ecosystem leader in graph databases, Neo4j has supported containerization technology, including Docker, for years. With this announcement, Kubernetes customers can now easily pair Neo4j with existing applications already running on their Kubernetes cluster or install other Kubernetes marketplace applications alongside Neo4j.

Original Link

Neo4j Launches Commercial Kubernetes Application on Google Cloud Platform Marketplace

On behalf of the Neo4j team, I am happy to announce that today we are introducing the availability of the Neo4j Graph Platform within a commercial Kubernetes application to all users of the Google Cloud Platform Marketplace.

This new offering provides customers with the ability to easily deploy Neo4j’s native graph database capabilities for Kubernetes directly into their GKE-hosted Kubernetes cluster.

The Neo4j Kubernetes application will be “Bring Your Own License” (BYOL). If you have a valid Neo4j Enterprise Edition license (including startup program licenses), the Neo4j application will be available to you.

Commercial Kubernetes applications can be deployed on-premise or even on other public clouds through the Google Cloud Platform Marketplace.

What This Means for Kubernetes Users

We’ve seen the Kubernetes user base growing substantially, and this application makes it easy for that community to launch Neo4j and take advantage of graph technology alongside any other workload they may use with Kubernetes.

Kubernetes customers are already building some of these same applications, and using Neo4j on Kubernetes, a user combines the graph capabilities of Neo4j alongside an existing application, such as an application that is generating recommendations by looking at the behavior of similar buyers, or a 360-degree customer view that uses a knowledge graph to help spot trends and opportunities.

GCP Marketplace + Neo4j

GCP Marketplace is based on a multi-cloud and hybrid-first philosophy, focused on giving Google Cloud partners and enterprise customers flexibility without lock-in. It also helps customers innovate by easily adopting new technologies from ISV partners, such as commercial Kubernetes applications, and allows companies to oversee the full lifecycle of a solution, from discovery through management.

As the ecosystem leader in graph databases, Neo4j has supported containerization technology, including Docker, for years. With this announcement, Kubernetes customers can now easily pair Neo4j with existing applications already running on their Kubernetes cluster or install other Kubernetes marketplace applications alongside Neo4j.

Original Link

Neo4j 3.4 Release Highlights in Less Than 8 Minutes [Video]

Hi everyone,

My name is Ryan Boyd, and I’m on the Developer Relations team here at Neo4j. I want to talk to you today about our latest release, Neo4j 3.4.

Overview

In Neo4j 3.4, we’ve made improvements to the entire graph database system, from scalability and performance to operations, administration, and security. We’ve also added several new key features to the Cypher query language, including spatial querying support and date/time types.

Scalability

Let’s talk about the scalability features in Neo4j 3.4.

In this release, we’ve added Multi-Clustering support. This allows your global Internet apps to horizontally partition their graphs by domain, such as country, product, customer or data center.

Now, why might you want to do this? You might want to use this new feature if you have a multi-tenant application that wants to store each customer’s data separately. You might also want to use this because you want to geopartition your data for certain regulatory requirements or if you want enhanced write scaling.

Look at the four clusters shown in the image above. Each of these clusters has a different graph, but they are managed together. They can also be used by a single application with Bolt routing the right data to the right cluster, and the data is kept completely separate.

Read Performance

As with all releases, in Neo4j 3.4 we made a number of improvements to read performance.

If you look at a read benchmark in a mixed workload environment, you can see that from Neo4j 3.2 to 3.3 we improved performance by 10%.

Now, for this release, we spent the last several release cycles working on an entirely new runtime for Neo4j Enterprise Edition. I’m proud to say that in Neo4j 3.4, we’ve made all queries use this new Cypher runtime, and that improves performance by roughly 70% on average.

Write Performance

Write performance is also important.

In our ongoing quest to take writes to the next level, we’ve been hammering away at one component that incurs roughly 80% of all overhead when writing to a graph. Now, what component it is may not be so obvious — it’s indexes.

Lucene is fantastic at certain things. It’s awesome at full text, for instance, but it turns out to be not so good for ACID writes with individually indexed fields. So, we’ve moved from using Lucene as our index provider to using our native Neo4j index.

We’ve actually moved to a native index for our label groupings in 3.2, for numerics in 3.3, and now, with the string support in 3.4, we’ve added a lot of the common property types to the new native index. This is what results in our significantly faster performance on writes.

Our native index is optimized for graphs. Its ACID-compliance allows you fast reads, and as you can see, approximately 10 times faster writes. The image below shows you the write performance for the first 3.4 release candidate when writing strings.

At the point at which we implemented the new native string index, we have approximately a 500% improvement in the overall write performance.

Ops and Admin

We’ve also made a number of improvements around operations and administration of Neo4j in the 3.4 release. Perhaps the most important is rolling upgrades.

Neo4j powers many mission-critical applications, and something many customers have told us is that they want the ability to upgrade their cluster without any planned downtime. This feature enables just that. So if you’re moving from Neo4j 3.4 to the next release, you could do it by upgrading each member in the cluster separately in a rolling fashion.

Neo4j 3.4 also adds auto cache reheating. So, let’s say that you normally heat up your cache when your Neo4j server starts. When you restart your server the next time, we’ll automatically handle the reheating of your cache for you.

The performance of backups is also important to many of our customers, and they are now two times faster.

Spatial & Date/Time Data

With Neo4j 3.4, we’ve now added the power of searching by spatial queries. Our geospatial graph queries allow you to search in a radius from a particular point and find all of the items that are located within that radius. This is indexed and highly performant.

In addition to supporting the standard X and Y dimensions, we’ve also added support so that you can run your queries in three dimensions. Now, how you might use this is totally up to you.

Think about a query like “Recommend a shirt available in a store close by in the men’s department.” You can take your location and find the different stores. And then, once you’re in a particular store, you can use that third dimension support — the Z axis — to find the particular floor and rack where that shirt is available.

In addition to the spatial type, we’ve also added support for date and time operations.

Database Security

We’ve also added a new security feature in this release that focuses on property-level security for keeping private data private.

Property-level security allows you to blacklist certain properties so that users with particular roles are unable to access those properties. In this case, users in Role X are unable to read property A , and users with Role Y are unable to read properties B and C.

Try It Out with the Neo4j Sandbox

For the GA release of Neo4j 3.4, we’ve created a special Neo4j Sandbox. The 3.4 sandbox has a guide that guides you through the new date/time type and spatial querying support.

Watch the video for a quick demo of the new Neo4j Sandbox, or try it out yourself by clicking below.

Try Out the Neo4j Sandbox

Original Link

Offers With Neo4j

Neo4j has many retailers as clients and one of their use cases is making offers to their customers. I was with a client today who had seen my boolean logic rules engine and decision tree blog posts and they were considering going that route for their offers but threw down the challenge of being able to do offers by just using Cypher. Their requirements were that offers can be of three types: “AllOf” offers require that the customer have all the requirements in order to be triggered, “AnyOf” offers, which required just one of the requirements to be met, and “Majority,” which required the majority of requirements to be met. The model could look like this:

Let’s go ahead and create some sample data:


CREATE (o:Offer { name: "Offer 1", type:"Majority", from_date: date({ year: 2018, month: 5, day: 1 }), to_date: date({ year: 2018, month: 5, day: 30 }) }),
(req1:Requirement {id:"Product 1"})<-[:REQUIRES]-(o),
(req2:Requirement {id:"Product 2"})<-[:REQUIRES]-(o),
(req3:Requirement {id:"New Customer"})<-[:REQUIRES]-(o),
(req4:Requirement {id:"In Illinois"})<-[:REQUIRES]-(o), (o2:Offer { name: "Offer 2", type:"AnyOf", from_date: date({ year: 2018, month: 5, day: 1 }), to_date: date({ year: 2018, month: 5, day: 30 })}),
(req5:Requirement {id:"Existing Customer"})<-[:REQUIRES]-(o2),
(req6:Requirement {id:"Last Purchase > 30 Days Ago"})<-[:REQUIRES]-(o2),
(req7:Requirement {id:"In California"})<-[:REQUIRES]-(o2), (o3:Offer { name: "Offer 3", type:"AllOf", from_date: date({ year: 2018, month: 5, day: 1 }), to_date: date({ year: 2018, month: 5, day: 30 })}),
(req1)<-[:REQUIRES]-(o3),
(req2)<-[:REQUIRES]-(o3),
(req3)<-[:REQUIRES]-(o3)

It looks like this in the Neo4j browser:

Now we are ready to write our query. It needs to return offers that are valid today, and they need to be relevant to the customer so they need to have at least one requirement in common with the customer. We must return the offer, the requirements we meet, all of the offers requirements, the missing requirements, and whether or not we meet those requirements. That sounds pretty complicated, but let’s see the finished query and then we can walk through it in steps:


MATCH (req:Requirement)<-[:REQUIRES]-(o:Offer)
WHERE o.from_date < date() < o.to_date AND req.id IN ["Product 1", "Product 2", "In Illinois", "Existing Customer"]
WITH o, COLLECT(req.id) AS have
MATCH (o)-[:REQUIRES]->(reqs:Requirement)
WITH o, have, COLLECT(reqs.id) AS need
RETURN o, have, need, CASE o.type WHEN "AnyOf" THEN ANY(x IN need WHERE x IN have)
WHEN "AllOf" THEN ALL(x IN need WHERE x IN have)
WHEN "Majority" THEN SIZE(have) > SIZE(need)/2.0
END AS qualifies, FILTER(x IN need WHERE NOT x IN have) AS missing

Not bad right? If you have never used the Cypher CASE statement or FILTER statement, click on those links to learn more about them. So, what’s our query doing? The first thing we want to do is use the “date()” function from Neo4j 3.4. to get today’s date and compare it to the from_date and to_date of our offers. The offers need to have at least one requirement that the user has, so we MATCH and use an “IN” clause to find them and collect them into a list by the offer that we call “have.”


MATCH (req:Requirement)<-[:REQUIRES]-(o:Offer)
WHERE o.from_date < date() < o.to_date AND req.id IN ["Product 1", "Product 2", "In Illinois", "Existing Customer"]
WITH o, COLLECT(req.id) AS have

Next, we find all of the requirements for our offer and collect them in a list we call “need.”


MATCH (o)-[:REQUIRES]->(reqs:Requirement)
WITH o, have, COLLECT(reqs.id) AS need

Next, we return the Offer, the have and need lists, and we use a CASE statement to figure out if we meet the requirements of the offer. If the offer is of type “AnyOf,” we just need to make sure that any requirement that we have is in the requirements that we need. If the offer is of type “AllOf,” we need to make sure ALL the requirements are met. These ANY and ALL keywords are predicates in cypher that return TRUE or FALSE.


RETURN o, have, need, CASE o.type WHEN "AnyOf" THEN ANY(x IN need WHERE x IN have)
WHEN "AllOf" THEN ALL(x IN need WHERE x IN have)

If the offer is of type “Majority,” then we make sure the size of the have list is greater than half the size of the need list. Majority requires 50% + 1, if we wanted “at least 50%” we could make that a greater than or equal to comparison instead. Finally, we want to return the missing requirements as well. We use a FILTER to get the list of missing requirements by checking each requirement in need and seeing if they are missing in the list of have.


WHEN "Majority" THEN SIZE(have) > SIZE(need)/2.0
END AS qualifies, FILTER(x IN need WHERE NOT x IN have) AS missing

and there we have it:

So give it a shot, try changing the requirements passed in the array and see how the results change. Remember, you will need Neo4j 3.4.0 or higher because of the use of the new date datatype. So go get it.

Before we end this, there are other ways to write this query, for example, we could have written the case statement in this way:


RETURN o, have, need, CASE o.type WHEN "AnyOf" THEN true
WHEN "AllOf" THEN SIZE(have) = SIZE(need)

It works because “AnyOf” is always true since we wouldn’t have gotten to the offer if none of the requirements matched. Instead of using the ALL predicate, we could simply compare the sizes of the two lists for AllOf. You may have been tempted to write “have = need” but the order of the items in the lists are not guaranteed and out of order lists are not equal even if they contain the same values.

Original Link

It’s Time for a Single Property Graph Query Language [Vote Now]

The time has come to create a single, unified property graph query language.

Different languages for different products help no one. We’ve heard from the graph community that a common query language would be powerful: more developers with transferable expertise, portable queries, solutions that leverage multiple graph options, and less vendor lock-in.

One language, one skill set.

The Property Graph Space Has Grown…A Lot

Property graph technology has a big presence from Neo4j and SAP HANA to Oracle PGX and Amazon Neptune. An international standard would accelerate the entire graph solution market, to the mutual benefit of all vendors and — more importantly — to all users.

That’s why we are proposing a unified graph query language, GQL (Graph Query Language), that fuses the best of three property graph languages.

Relational Data Has SQL, and Property Graphs Need GQL

Although SQL has been fundamental for relational data, we need a declarative query language for the powerful — and distinct — property graph data model to play a similar role.

Like SQL, the new GQL needs to be an industry standard. It should work with SQL but not be confined by SQL. The result would be better choices for developers, data engineers, data scientists, CIOs, and CDOs alike.

Right now, there are three property graph query languages that are closely related. We have Cypher (from Neo4j and the openCypher community), we have PGQL (from Oracle), and we have G-CORE, a research language proposal from the Linked Data Benchmark Council [LDBC] (co-authored by world-class researchers from the Netherlands, Germany, Chile, the U.S, and technical staff from SAP, Oracle, Capsenta, and Neo4j).

The proposed GQL (Graph Query Language) would combine the strengths of Cypher, PGQL, and G-CORE into one vendor-neutral and standardized query language for graph solutions, much like SQL is for RDBMS.

Each of these three query languages has similar data models, syntax, and semantics. Each has its merits and gaps, yet their authors share many ambitions for the next generation of graph queryings, such as a composable graph query language with graph construction, views, and named graphs; and a pattern-matching facility that extends to regular path queries.

Let Your Voice Be Heard on GQL

The Neo4j team is advocating that the database industry and our users collaborate to define and standardize one language.

Bringing PGQL, G-CORE, and Cypher together, we have a running start. Two of them are industrial languages with thousands of users, and combined with the enhancements of a research language, they all share a common heritage of ASCII art patterns to match, merge, and create graph models.

What matters most right now is a technically strong standard with strong backing among vendors and users. So we’re appealing for your vocal support.

Please vote now on whether we should unite to create a standard Graph Query Language (GQL), in the same manner as SQL.

Should the property graph community unite to create a standard Graph Query Language, GQL, alongside SQL?

For more information, you can read the GQL manifesto here and watch for ongoing updates.

Emil Eifrem, CEO;
Philip Rathle, VP of Products;
Alastair Green, Lead, Query Languages Standards & Research;
for the entire Neo4j team

Original Link

Graph Algorithms in Neo4j: 15 Different Graph Algorithms and What They Do

Graph analytics have value only if you have the skills to use them and if they can quickly provide the insights you need. Therefore, the best graph algorithms are easy to use, are fast to execute, and produce powerful results.

Neo4j includes a growing, open library of high-performance graph algorithms that reveal the hidden patterns and structures in your connected data.

In this series on graph algorithms, we’ll discuss the value of graph algorithms and what they can do for you. Previously, we explored how data connections drive future discoveries and how to streamline those data discoveries with graph analytics.

This week, we’ll take a detailed look at the many graph algorithms available in Neo4j and what they do.

Using Neo4j graph algorithms, you’ll have the means to understand, model, and predict complicated dynamics such as the flow of resources or information, the pathways that contagions or network failures spread, and the influences on and resiliency of groups.

And because Neo4j brings together analytics and transaction operations in a native graph platform, you’ll not only uncover the inner nature of real-world systems for new discoveries but also develop and deploy graph-based solutions faster and have easy-to-use, streamlined workflows. That’s the power of an optimized approach.

Here is a list of the many algorithms that Neo4j uses in its graph analytics platform, along with an explanation of what they do.

Traversal and Pathfinding Algorithms

1. Parallel Breadth-First Search (BFS)

What it does: Traverses a tree data structure by fanning out to explore the nearest neighbors and then their sub-level neighbors. It’s used to locate connections and is a precursor to many other graph algorithms.

BFS is preferred when the tree is less balanced or the target is closer to the starting point. It can also be used to find the shortest path between nodes or avoid the recursive processes of depth-first search.

How it’s used: Breadth-first search can be used to locate neighbor nodes in peer-to-peer networks like BitTorrent, GPS systems to pinpoint nearby locations, and social network services to find people within a specific distance.

2. Parallel Depth-First Search (DFS)

What it does: Traverses a tree data structure by exploring as far as possible down each branch before backtracking. It’s used on deeply hierarchical data and is a precursor to many other graph algorithms. Depth-first search is preferred when the tree is more balanced or the target is closer to an endpoint.

How it’s used: Depth-first search is often used in gaming simulations where each choice or action leads to another, expanding into a tree-shaped graph of possibilities. It will traverse the choice tree until it discovers an optimal solution path (i.e. win).

3. Single-Source Shortest Path

What it does: Calculates a path between a node and all other nodes whose summed value (weight of relationships such as cost, distance, time, or capacity) to all other nodes is minimal.

How it’s used: Single-source shortest path is often applied to automatically obtain directions between physical locations, such as driving directions via Google Maps. It’s also essential in logical routing, such as telephone call routing (least-cost routing).

4. All-Pairs Shortest Path

What it does: Calculates a shortest path forest (group) containing all shortest paths between the nodes in the graph. It’s commonly used for understanding alternate routing when the shortest route is blocked or becomes sub-optimal.

How it’s used: All-pairs shortest path is used to evaluate alternate routes for situations, such as a freeway backup or network capacity. It’s also key in logical routing to offer multiple paths; for example, call routing alternatives.

5. Minimum Weight Spanning Tree (MWST)

What it does: Calculates the paths along a connected tree structure with the smallest value (weight of the relationship such as cost, time, or capacity) associated with visiting all nodes in the tree. It’s also employed to approximate some NP-hard problems such as the traveling salesman problem and randomized or iterative rounding.

How it’s used: Minimum weight spanning tree is widely used for network designs: least-cost logical or physical routing such as laying cable, fastest garbage collection routes, capacity for water systems, efficient circuit designs, and much more. It also has real-time applications with rolling optimizations, such as processes in a chemical refinery or driving route corrections.

Centrality Algorithms

6. PageRank

What it does: Estimates a current node’s importance from its linked neighbors and then again from their neighbors. A node’s rank is derived from the number and quality of its transitive links to estimate influence. Although popularized by Google, it’s widely recognized as a way of detecting influential nodes in any network.

How it’s used: PageRank is used in quite a few ways to estimate importance and influence. It’s used to suggest Twitter accounts to follow and for general sentiment analysis.

PageRank is also used in machine learning to identify the most influential features for extraction. In biology, it’s been used to identify which species extinctions within a food web would lead to biggest chain reaction of species death.

7. Degree Centrality

What it does: Measures the number of relationships a node (or an entire graph) has. It’s broken into indegree (flowing in) and outdegree (flowing out) where relationships are directed.

How it’s used: Degree centrality looks at immediate connectedness for uses such as evaluating the near-term risk of a person catching a virus or hearing information. In social studies, the indegree of friendship can be used to estimate popularity and outdegree as gregariousness.

8. Closeness Centrality

What it does: Measures how central a node is to all its neighbors within its cluster. Nodes with the shortest paths to all other nodes are assumed to be able to reach the entire group the fastest.

How it’s used: Closeness centrality is applicable in a number of resources, communication, and behavioral analysis, especially when interaction speed is significant. It has been used for identifying the best location of new public services for maximum accessibility.

In social network analysis, it is used to find people with the ideal social network location for faster dissemination of information.

9. Betweenness Centrality

What it does: Measures the number of shortest paths (first found with breadth-first search) that pass through a node. Nodes that most frequently lie on shortest paths have higher betweenness centrality scores and are the bridges between different clusters. It is often associated with the control over the flow of resources and information.

How it’s used: Betweenness centrality applies to a wide range of problems in network science and is used to pinpoint bottlenecks or likely attack targets in communication and transportation networks. In genomics, it has been used to understand the control certain genes have in protein networks for improvements such as better drug/disease targeting.

Betweenness Centrality has also be used to evaluate information flows between multiplayer online gamers and expertise sharing communities of physicians.

Community Detection Algorithms

This category is also known as clustering algorithms or partitioning algorithms.

10. Label Propagation

What it does: Spreads labels based on neighborhood majorities as a means of inferring clusters. This extremely fast graph partitioning requires little prior information and is widely used in large-scale networks for community detection. It’s a key method for understanding the organization of a graph and is often a primary step in other analysis.

How it’s used: Label propagation has diverse applications, from understanding consensus formation in social communities to identifying sets of proteins that are involved together in a process (functional modules) for biochemical networks. It’s also used in semi- and unsupervised machine learning as an initial preprocessing step.

11. Strongly Connected

What It Does: Locates groups of nodes where each node is reachable from every other node in the same group following the direction of relationships. It’s often applied from a depth-first search.

How it’s used:Strongly connected is often used to enable running other algorithms independently on an identified cluster. As a preprocessing step for directed graphs, it helps quickly identify disconnected groups. In retail recommendations, it helps identify groups with strong affinities that then are used for suggesting commonly preferred items to those within that group who have not yet purchased the item.

12. Union-Find/Connected Components/Weakly Connected

What it does: Finds groups of nodes where each node is reachable from any other node in the same group, regardless of the direction of relationships. It provides near constant-time operations (independent of input size) to add new groups, merge existing groups, and determine whether two nodes are in the same group

How it’s used: Union-find/connected components is often used in conjunction with other algorithms, especially for high-performance grouping. As a preprocessing step for undirected graphs, it helps quickly identify disconnected groups.

13. Louvain Modularity

What it does: Measures the quality (i.e. presumed accuracy) of a community grouping by comparing its relationship density to a suitably defined random network. It’s often used to evaluate the organization of complex networks and community hierarchies in particular. It’s also useful for initial data preprocessing in unsupervised machine learning.

How it’s used: Louvain is used to evaluate social structures on Twitter, LinkedIn, and YouTube. It’s used in fraud analytics to evaluate whether a group has just a few bad behaviors or is acting as a fraud ring that would be indicated by a higher relationship density than average. Louvain revealed a six-level customer hierarchy in a Belgian telecom network.

14. Local Clustering Coefficient/Node Clustering Coefficient

What it does: For a particular node, it quantifies how close its neighbors are to being a clique (every node is directly connected to every other node). For example, if all your friends knew each other directly, your local clustering coefficient would be 1. Small values for a cluster would indicate that although a grouping exists, the nodes are not tightly connected.

How it’s used: Local cluster coefficient is important for estimating resilience by understanding the likelihood of group coherence or fragmentation. An analysis of a European power grid using this method found that clusters with sparsely connected nodes were more resilient against widespread failures.

15. Triangle-Count and Average Clustering Coefficient

What it does: Measures how many nodes have triangles and the degree to which nodes tend to cluster together. The average clustering coefficient is 1 when there is a clique and 0 when there are no connections. For the clustering coefficient to be meaningful, it should be significantly higher than a version of the network where all of the relationships have been shuffled randomly.

How it’s used: The average clustering coefficient is often used to estimate whether a network might exhibit “small-world” behaviors that are based on tightly knit clusters. It’s also a factor for cluster stability and resiliency. Epidemiologists have used the average clustering coefficient to help predict various infection rates for different communities.

Conclusion

The world is driven by connections. Neo4j graph analytics reveals the meaning of those connections using practical, optimized graph algorithms including the ones detailed above.

This concludes our series on graph algorithms in Neo4j. We hope these algorithms help you make sense of your connected data in more meaningful and effective ways.

Original Link

The Basics of Databases

Welcome back to our monthly database series! Last time, we took a look at the biggest database articles and news from the month of March. In this article, we’re going to look at some introductory database articles on DZone, explore the concept of databases elsewhere on the web, and look at some publications related to databases.


D(atabase)Zone

Check out some of the top introductory database articles on DZone to understand the basics of databases,

  1. The Types of Modern Databases by John Hammink. Where do you begin in choosing a database? We’ve looked at both NoSQL and relational database management systems to come up with a bird’s eye view of both ecosystems to get you started.
  2. Making Graph Databases Fun Again With Java by Otavio Santana. Graph databases need to be made fun again! Not to worry — the open-source TinkerPop from Apache is here to do just that.
  3. How Are Databases Evolving? by Tom Smith. One way that databases are evolving is through the integration and convergence of technologies on the cloud using microservices.
  4. 10 Easy Steps to a Complete Understanding of SQL by Lukas Eder. Too many programmers think SQL is a bit of a beast. It’s one of the few declarative languages out there, and as such, behaves in an entirely different way from imperative, object-oriented, or even functional languages.
  5. MongoDB vs. MySQL by Mihir Shah. There are many database management systems in the market to choose from. So how about a faceoff between two dominant solutions that are close in popularity?

PS: Are you interested in contributing to DZone? Check out our Bounty Board, where you can apply for specific writing prompts and win prizes!


Databasin’ It Up

Let’s journey outside of DZone and check out some recent news, conferences, and more that should be of interest to database newbies.


Dive Even Deeper Into Database

DZone has Guides and Refcardz on pretty much every tech-related topic, but if you’re specifically interested in databases these will appeal the most to you.

  1. The DZone Guide to Databases: Speed, Scale, and Security. Advances in database technology have traditionally been lethargic. That trend has shifted recently with a need to store larger and more dynamic data. This DZone Guide is focused on how to prepare your database to run faster, scale with ease, and effectively secure your data.
  2. Graph-Powered Search: Neo4j & Elasticsearch. In this Refcard, learn how combining technologies adds another level of quality to search results based on code and examples.

Original Link

Graph Algorithms in Neo4j: Streamline Data Discoveries With Graph Analytics

To analyze the billions of relationships in your connected data, you need efficiency and high performance, as well as powerful analytical tools that address a wide variety of graph problems.

Fortunately, graph algorithms are up to the challenge.

In this series on graph algorithms, we’ll discuss the value of graph algorithms and what they can do for you. Last week, we explored how data connections drive future discoveries. This week, we’ll take a closer look at Neo4j’s Graph Analytics platform and put its performance to the test.

The Neo4j Graph Analytics Platform

Neo4j offers a reliable and performant native-graph platform that reveals the value and maintains the integrity of connected data.

First, we delivered the Neo4j graph database, originally used in online transaction processing with exceptionally fast transversals. Then, we added advanced, yet practical, graph analytics tools for data scientists and solutions teams.

Streamline Your Data Discoveries

We offer a growing, open library of high-performance graph algorithms for Neo4j that are easy to use and optimized for fast results. These algorithms reveal the hidden patterns and structures in your connected data around community detection, centrality, and pathways with a core set of tested (at scale) and supported algorithms.

The highly extensible nature of Neo4j enabled the creation of this graph library and exposure as procedures — without making any modification to the Neo4j database.

These algorithms can be called upon as procedures (from our APOC library), and they’re also customizable through a common graph API. This set of advanced, global graph algorithms is simple to apply to existing Neo4j instances, so your data scientists, solutions developers, and operational teams can all use the same native graph platform.

Neo4j also includes graph projection, an extremely handy feature that places a logical sub-graph into a graph algorithm when your original graph has the wrong shape or granularity for that specific algorithm.

For example, if you’re looking to understand the relationship between drug results for men versus women, but your graph is not partitioned for this, you’ll be able to temporarily project a sub-graph to quickly run your algorithm upon and move on to the next step.

Example: High Performance of Neo4j Graph Algorithms

Neo4j graph algorithms are extremely efficient so you can analyze billions of relationships using common equipment and get your results in seconds to minutes, and in a few hours for the most complicated queries.

The chart below shows how Neo4j’s optimized algorithms yield results up to three times faster than Apache Spark(TM) GraphX for Union-Find (Connected Components) and PageRank on the Twitter-2010 dataset with 1.4 billion relationships.

Even more impressive, running the Neo4j PageRank algorithm on a significantly larger dataset with 18 billion relationships and 3 billion nodes delivered results in only 1 hour and 45 minutes (using 144 CPUs and 1TB of RAM).

In addition to optimizing the algorithms themselves, we’ve parallelized key areas such as loading and preparing data as well as algorithms like breadth-first search and depth-first search where applicable.

Conclusion

As you can see, using graph algorithms helps you surface hidden connections and actionable insights obscured within your hordes of data. But even more importantly, the right graph algorithms are optimized to keep your computing costs and time investment to a minimum. Those graph algorithms are available to you know via the Neo4j Graph Platform — and they’re waiting to help you with your next data breakthrough.

Next week, we’ll explore specific graph algorithms, describing what they do and how they’re used.

Original Link

How to Do Graph Analysis on PostgreSQL With Arcade

Graph visualization and analysis are critical tools to have in your toolkit. Developers, analysts, business executives, and really anyone that uses data can use graph visualization tools to extract information from data and see how the data interacts. It is one thing to know that something exists; it is another to see how it affects and is affected by the things around it.

In today’s market, there are several graph visualization tools that are able to connect to various graph databases. Most people find that issues arise when they do not have access to a graph database but want to use a graph visualization tool. Most graph visualization tools do not have the ability to integrate with a relational database, the most commonly used database. One solution is migration. Companies can spend massive amounts of money to move all of their data from a relational database to a graph database. However, for many people, this is not a solution. Migrations are expensive and time-consuming and many people do not want to deal with the headache that it can bring.

If you are in this situation, an attractive alternative solution is Arcade Analytics. Arcade Analytics is a graph visualization tool that enables users to have more control over their data. It sits on top of the user’s database and allows the users to query data and show it in a graph. One of the most attractive features of Arcade Analytics is that it allows users to query data from a relational database and visualize the relational database as a graph. Arcade’s RDBMS connector allows users to perform a graph analysis over your RDBMS without any migration and with few simple steps.

To understand how this is possible, let’s explore the RDBMS Connector. 

How Does It Work?

You can visually inspect relationships and connections within your RDBMS and treat your data as a graph.

The key to achieving that is the model mapping between the source data model, the entity-relationship model (ER), and the target data model, the graph model.

Once a coherent and effective model mapping is performed (don’t worry, that’s a completely automated process), you can query your source dataset and play with it as if it were a graph:

  • Each record is transformed.
  • Each connection between two records, inferred through a relationship between two tables and computed through a join operation, generates an edge.

Image title

The ER model is built starting from the source DB schema: each table, known also as an Entity, and each Relationship in the DB is inferred from the metadata.

This automated mapping strategy adopts a basic approach: the source DB schema is directly translated as follows:

  1. Each Entity in the source DB is converted into a vertex type.
  2. Each Relationship between two Entities in the source DB is converted into an edge type.

All the records of each table are handled according to this schemas-mapping: each pair of records on which it’s possible to perform a join operation will correspond to a pair of vertices connected by an edge of a specific Edge Type.

Arcade allows connections to Oracle, SQLServer, MySQL, PostgreSQL, and HyperSQL. To show you how the tool works, I chose a sample relational database. We will connect to a PostgreSQL server and we will perform a graph analysis on the DVD Rental database. To learn more about this specific dataset, click here.

Here is the source database schema (AKA the ER Model).

Image title

According to the mapping rules stated above, the following correspondent graph model will be automatically built from the tool.

Image title

Really simple, isn’t it?! Still some doubts? Okay, let’s see a couple of mapping examples.

1-N Relationship Between Film and Language

The Film and Language source tables are translated in two correspondent vertex types. The properties contained in the Film and Language vertex types directly come from the columns belonging to the two source tables.

The logical relationship between the two tables generates the HasLanguage edge type.

Image title

N-N Relationship Between Actors

In RDBMSs, N-N relationships are expressed through join tables. In the sample schema below, you can see how the N-N relationship between actors and movies is modeled through the join table Film_Actor.

Image title

As you can see, also the central join table is translated into a specific vertex type, allowing you to traverse the N-N relationship between actors and movies.

For this reason, traversing N-N relationships is equivalent to traverse two 1-N relationships, like in the relational world:

  • 1-N relationship between Film and Film_Actor
  • 1-N relationship between Film_Actor and Film

Now, let’s start to play with data!

Retrieving Data and Performing Our First Analysis

Once connected to the source relational database, we will get a new empty widget, like the following.

Image title

First, we will retrieve some data. We have a few main ways to do that:

Text search bar:

Image title

Typing a query against our source relational database:

Image title

We will start with a specific actor, retrieving his information through a full-text search on the name. Then, we will expand all his connections present in the source database. For the purposes of this example, I will search for Christian Gable.

Image title

Let’s load the vertex by clicking the designated button. Below, you can see the new vertex: to inspect its content, we just have to open the Graph Element menu.

Image title

Now, we can start our analysis: let’s suppose we want to find all the customers who rented a movie where Christian Gable performed as an actor. We can simply do that by navigating all the relationships of our graph model by using the Traverse menu.

First, we have to retrieve all the vertices of the film_actor join table. Doing this will also allow us to fetch the film vertices.

Image title

Image title

Now, starting from the movies, we can expand all the 270 ongoing relationships.

Image title

Image title

We have now retrieved all the vertices connected to each specific movie — in this case, the has_film edge type is mapped with three different relationships:

  1. inventoryfilm
  2. film_category (join table) → film
  3. film_actor (join table) → film

In order to find all the customers who rented all these movies, we should focus just on the inventory → film relationship, so we can get rid of useless vertices belonging to film_category class and the new film_actor nodes by deleting them (click the Class button on the right-hand side to select the desired nodes and use the Delete button in the sidebar).

Image title

Image title

Now we can navigate the ingoing has_inventory edges, connecting inventory vertices with those belonging to the rental class.

Image title

We arrive at the last connection: from each rental vertex, we can reach a specific customer by expanding the outgoing has_customer relationship.

Image title

Here we are. We have found all the customers who rented a movie where they could see a wonderful performance by the great Christian Gable!

Now, what if we want to narrow our analysis to a subset of customers according to a specific movie of Christian Gable?

Very simple: we can select a specific film and play with the selection of the ingoing and outgoing elements till a certain depth (Selection menu), as shown in the following screenshots.

Image title

Image title

Image title

At this point, we can delete all the rest of the graph by inverting the current selection through the Invert operation.

Image title

Image title

Then, we can delete them all with a click… and here is the result.

Image title

This RDBMS Connector is just a beta version. There are some limitations, i.e. if you didn’t define constraints such as foreign keys between the tables on which you usually perform join operations, you would lose this kind of information during the querying process.

Because of this, if foreign keys are missing, you will not have any edges in your final Graph Model, and you will not able to traverse any relationship.

To overcome these limitations, the Arcade team is working on a visual tool mapper, allowing users to edit the basic mapping. This way, you will be able to add connections to your dataset by defining new edge types between vertices.

In this post, we had just a taste of the analyses we can perform over a relational database thanks to Arcade, new features will be shown in next posts.

I hope this post was helpful and interesting.

Stay tuned!

To play with this data yourself, click here to access Arcade’s online demo.

Original Link

Scheduling Meetings With Neo4j

One of the symptoms of any fast-growing company is the lack of available meeting rooms. The average office worker gets immense satisfaction to their otherwise mundane workday when they get to kick someone else out of the meeting room they booked. Of course, that joy can be cut short (along with their career) once realizing some unnoticed VIP was unceremoniously kicked out. It’s not a super exciting use case, but today, I’m going to show you how to use Neo4j to perform some scheduling gymnastics.

Let’s start with what the data model looks like:

So, we have a Person that sits in a Cubicle that is located in a Floor that has meeting Rooms where Meetings are booked and these Meetings are attended by People. That’s a nice circle model right there. Let’s build an example with three people each in their own cubicle, two floors, four meeting rooms, two in each floor, and a bunch of meetings. We’ll also have one of the people booked in one of the existing meetings. We will use Longs for the times representing Unix Epoc Time in milliseconds. In Neo4j 3.4, we will have legitimate date and datetime data types, so you will be able to create date times like localdatetime({year:1984, month:10, day:11, hour:12, minute:31, second:14}) instead of this hot mess, but regardless, here is the Cypher for this example:

CREATE (person1:Person {name: "Max"})
CREATE (person2:Person {name: "Alex"})
CREATE (person3:Person {name: "Andrew"})
CREATE (cube1A:Cubicle {name: "F1A"})
CREATE (cube1B:Cubicle {name: "F1B"})
CREATE (cube2A:Cubicle {name: "F2A"})
CREATE (floor1:Floor {name: "Floor 1"})
CREATE (floor2:Floor {name: "Floor 2"})
CREATE (person1)-[:SITS_IN]->(cube1A)
CREATE (person2)-[:SITS_IN]->(cube1B)
CREATE (person3)-[:SITS_IN]->(cube1C)
CREATE (cube1A)-[:LOCATED_IN]->(floor1)
CREATE (cube1B)-[:LOCATED_IN]->(floor1)
CREATE (cube1C)-[:LOCATED_IN]->(floor2)
CREATE (room1:Room {name:"Room 1"})
CREATE (room2:Room {name:"Room 2"})
CREATE (room3:Room {name:"Room 3"})
CREATE (room4:Room {name:"Room 4"})
CREATE (room1)-[:LOCATED_IN]->(floor1)
CREATE (room2)-[:LOCATED_IN]->(floor1)
CREATE (room3)-[:LOCATED_IN]->(floor2)
CREATE (room4)-[:LOCATED_IN]->(floor2)
CREATE (m1:Meeting {start_time: 1521534600000, end_time:1521538200000}) // 8:30-9:30am
CREATE (m2:Meeting {start_time: 1521543600000, end_time:1521550800000}) // 11-1pm
CREATE (m3:Meeting {start_time: 1521550800000, end_time:1521558000000}) // 1-3pm
CREATE (m4:Meeting {start_time: 1521534600000, end_time:1521543600000}) // 8:30-11am
CREATE (m5:Meeting {start_time: 1521550800000, end_time:1521554400000}) // 1-2pm
CREATE (m6:Meeting {start_time: 1521561600000, end_time:1521565200000}) // 4-5pm
CREATE (m7:Meeting {start_time: 1521558000000, end_time:1521561600000}) // 3-4pm
CREATE (room1)-[:IS_BOOKED_ON_2018_03_20]->(m1)
CREATE (room1)-[:IS_BOOKED_ON_2018_03_20]->(m2)
CREATE (room1)-[:IS_BOOKED_ON_2018_03_20]->(m3)
CREATE (room2)-[:IS_BOOKED_ON_2018_03_20]->(m4)
CREATE (room2)-[:IS_BOOKED_ON_2018_03_20]->(m5)
CREATE (room2)-[:IS_BOOKED_ON_2018_03_20]->(m6)
CREATE (room4)-[:IS_BOOKED_ON_2018_03_20]->(m7)
CREATE (person2)-[:HAS_MEETING_ON_2018_03_20]->(m7)

This Cypher script creates this lovely set of data:

By looking at it, we can see how it is all connected, but it is not immediately obvious what times in what rooms people are able to meet. That is going to be our question. Give a set of meeting attendees and a datetime range and find me the available meeting times in the rooms that are in the same floor as at least one of the attendees. Let’s try building this query together once piece at a time. So, the first thing I want to do is find out what time ranges I have to eliminate because one of the attendees is already booked for another meeting.

MATCH (p:Person)
WHERE p.name IN ["Max", "Alex", "Andrew"]
OPTIONAL MATCH (p)-[:HAS_MEETING_ON_2018_03_20]->(m:Meeting)
RETURN p, m
╒═════════════════╤═════════════════════════════════════════════════════╕
│"p" │"m" │
╞═════════════════╪═════════════════════════════════════════════════════╡
│{"name":"Max"} │null │
├─────────────────┼─────────────────────────────────────────────────────┤
│{"name":"Alex"} │{"end_time":1521561600000,"start_time":1521558000000}│
├─────────────────┼─────────────────────────────────────────────────────┤
│{"name":"Andrew"}│null │
└─────────────────┴─────────────────────────────────────────────────────┘

So, it looks like Alex is busy from 3-4 PM. Next, we need to figure out where everyone sits, what floor they are in, and what rooms we are able to meet in. So, our query looks like this:

MATCH (p:Person)-[:SITS_IN]->(c:Cubicle)-[:LOCATED_IN]->(f:Floor)<-[:LOCATED_IN]-(r:Room)
WHERE p.name IN ["Max", "Alex", "Andrew"]
RETURN DISTINCT r
ORDER BY r.name

This gets us Rooms 1-4 as expected:

╒═════════════════╕
│"r" │
╞═════════════════╡
│{"name":"Room 1"}│
├─────────────────┤
│{"name":"Room 2"}│
├─────────────────┤
│{"name":"Room 3"}│
├─────────────────┤
│{"name":"Room 4"}│
└─────────────────┘

OK, so far, so good. Now, we need to know if those rooms have already been booked for other meetings today.

MATCH (p:Person)-[:SITS_IN]->(c:Cubicle)-[:LOCATED_IN]->(f:Floor)<-[:LOCATED_IN]-(r:Room)
WHERE p.name IN ["Max", "Alex", "Andrew"]
WITH r
OPTIONAL MATCH (r)-[:IS_BOOKED_ON_2018_03_20]->(m:Meeting)
WITH r, m ORDER BY m.start_time
RETURN r.name, COLLECT(DISTINCT m) AS meetings
ORDER BY r.name

This query tells us that Room 1 has three meetings scheduled; Room 2 has three, as well; Room 3 is wide open; Room 4 just has one. But it is really hard to see the actual times since they are shown as Longs.

╒════════╤══════════════════════════════════════════════════════════════════════╕
│"r.name"│"meetings" │
╞════════╪══════════════════════════════════════════════════════════════════════╡
│"Room 1"│[{"end_time":1521538200000,"start_time":1521534600000},{"end_time":152│
│ │1550800000,"start_time":1521543600000},{"end_time":1521558000000,"star│
│ │t_time":1521550800000}] │
├────────┼──────────────────────────────────────────────────────────────────────┤
│"Room 2"│[{"end_time":1521543600000,"start_time":1521534600000},{"end_time":152│
│ │1554400000,"start_time":1521550800000},{"end_time":1521565200000,"star│
│ │t_time":1521561600000}] │
├────────┼──────────────────────────────────────────────────────────────────────┤
│"Room 3"│[] │
├────────┼──────────────────────────────────────────────────────────────────────┤
│"Room 4"│[{"end_time":1521561600000,"start_time":1521558000000}] │
└────────┴──────────────────────────────────────────────────────────────────────┘

If you are using the Neo4j APOC plugin, we can use the apoc.date.format function to make them friendlier. In Neo4j 3.4, you will be able to use a built-in function datetime.FromEpochMillis for the same thing.

MATCH (p:Person)-[:SITS_IN]->(c:Cubicle)-[:LOCATED_IN]->(f:Floor)<-[:LOCATED_IN]-(r:Room)
WHERE p.name IN ["Max", "Alex", "Andrew"]
WITH r
OPTIONAL MATCH (r)-[:IS_BOOKED_ON_2018_03_20]->(m:Meeting)
WITH r, m ORDER BY m.start_time
RETURN r.name, EXTRACT (x IN COLLECT(DISTINCT m) | apoc.date.format(x.start_time,'ms','HH:mm') + ' to ' + apoc.date.format(x.end_time,'ms','HH:mm')) AS meetings
ORDER BY r.name

Here we go; now, that is way more readable:

╒════════╤════════════════════════════════════════════════════╕
│"r.name"│"meetings" │
╞════════╪════════════════════════════════════════════════════╡
│"Room 1"│["08:30 to 09:30","11:00 to 13:00","13:00 to 15:00"]│
├────────┼────────────────────────────────────────────────────┤
│"Room 2"│["08:30 to 11:00","13:00 to 14:00","16:00 to 17:00"]│
├────────┼────────────────────────────────────────────────────┤
│"Room 3"│[] │
├────────┼────────────────────────────────────────────────────┤
│"Room 4"│["15:00 to 16:00"] │
└────────┴────────────────────────────────────────────────────┘

Alright, let’s combine the two queries together and see what rooms we can meet in and what times we can’t meet in those rooms because they are either already booked, or one of our attendees is busy:

MATCH (p:Person)
WHERE p.name IN ["Max", "Alex", "Andrew"]
OPTIONAL MATCH (p)-[:HAS_MEETING_ON_2018_03_20]->(m:Meeting)
WITH COLLECT(m) AS occupied
MATCH (p:Person)-[:SITS_IN]->(c:Cubicle)-[:LOCATED_IN]->(f:Floor)<-[:LOCATED_IN]-(r:Room)
WHERE p.name IN ["Max", "Alex", "Andrew"]
WITH r, occupied
OPTIONAL MATCH (r)-[:IS_BOOKED_ON_2018_03_20]->(m:Meeting)
WITH r, COLLECT(DISTINCT m) + occupied AS meetings
UNWIND meetings AS m
WITH r, m ORDER BY m.start_time
RETURN r.name, EXTRACT (x IN COLLECT(m) | apoc.date.format(x.start_time,'ms','HH:mm') + ' to ' + apoc.date.format(x.end_time,'ms','HH:mm')) AS meetings
ORDER BY r.name
╒════════╤═════════════════════════════════════════════════════════════════════╕
│"r.name"│"meetings" │
╞════════╪═════════════════════════════════════════════════════════════════════╡
│"Room 1"│["08:30 to 09:30","11:00 to 13:00","13:00 to 15:00","15:00 to 16:00"]│
├────────┼─────────────────────────────────────────────────────────────────────┤
│"Room 2"│["08:30 to 11:00","13:00 to 14:00","15:00 to 16:00","16:00 to 17:00"]│
├────────┼─────────────────────────────────────────────────────────────────────┤
│"Room 3"│["15:00 to 16:00"] │
├────────┼─────────────────────────────────────────────────────────────────────┤
│"Room 4"│["15:00 to 16:00","15:00 to 16:00"] │
└────────┴─────────────────────────────────────────────────────────────────────┘

Now, we could stop here and let our application mark those times as unavailable and call it a day. But what we really want is the opposite of that. We want the times that the rooms and attendees are available. So, how do we figure that out? Well, for each meeting, we want to find the next meeting start time for each room. The time slot between meetings is what we are after, defined by the entry’s end time and the start time of the next event. To perform this, we are going to use a double-unwind, which is basically “for each thing in the list, I want to pair it (get a cross product) with every other thing in the list.” Typically, this is the last thing you want to do since making a cross product can be very expensive, but it makes perfect sense for this query. We only care about the times where one meeting start time is greater than or equal to the other end time, and from these, we will grab our time slot as the query below shows:

MATCH (p:Person)
WHERE p.name IN ["Max", "Alex", "Andrew"]
OPTIONAL MATCH (p)-[:HAS_MEETING_ON_2018_03_20]->(m:Meeting)
WITH COLLECT(m) AS occupied
MATCH (p:Person)-[:SITS_IN]->(c:Cubicle)-[:LOCATED_IN]->(f:Floor)<-[:LOCATED_IN]-(r:Room)
WHERE p.name IN ["Max", "Alex", "Andrew"]
WITH r, occupied
OPTIONAL MATCH (r)-[:IS_BOOKED_ON_2018_03_20]->(m:Meeting)
WITH r, [{start_time:1521565200000, end_time:1521534600000}] + COLLECT(m) + occupied AS meetings
UNWIND meetings AS m
WITH r, [min(m.start_time), max(m.end_time)] AS rslot, COLLECT(m) AS mm
WITH r, rslot, mm
UNWIND mm AS m1
UNWIND mm AS m2
WITH r, rslot, m1, m2 WHERE (m2.start_time >= m1.end_time)
WITH r, rslot, [m1.end_time, min(m2.start_time)] AS slot
ORDER BY slot[0]
RETURN r.name, EXTRACT (x IN COLLECT(slot) | apoc.date.format(x[0],'ms','HH:mm') + ' to ' + apoc.date.format(x[1],'ms','HH:mm')) AS available
ORDER BY r.name

Our output looks close, but it’s not quite there. Rooms 3 and 4 look correct, but for Room 1 and 2, we have start time and end times that are the same:

╒════════╤══════════════════════════════════════════════════════════════════════╕
│"r.name"│"available" │
╞════════╪══════════════════════════════════════════════════════════════════════╡
│"Room 1"│["08:30 to 08:30","09:30 to 11:00","13:00 to 13:00","15:00 to 15:00","│
│ │16:00 to 17:00"] │
├────────┼──────────────────────────────────────────────────────────────────────┤
│"Room 2"│["08:30 to 08:30","11:00 to 13:00","14:00 to 15:00","16:00 to 16:00","│
│ │17:00 to 17:00"] │
├────────┼──────────────────────────────────────────────────────────────────────┤
│"Room 3"│["08:30 to 15:00","16:00 to 17:00"] │
├────────┼──────────────────────────────────────────────────────────────────────┤
│"Room 4"│["08:30 to 15:00","16:00 to 17:00"] │
└────────┴──────────────────────────────────────────────────────────────────────┘

So, let’s fix that by not allowing any slots that start and end at the same time:

MATCH (p:Person)
WHERE p.name IN ["Max", "Alex", "Andrew"]
OPTIONAL MATCH (p)-[:HAS_MEETING_ON_2018_03_20]->(m:Meeting)
WITH COLLECT(m) AS occupied
MATCH (p:Person)-[:SITS_IN]->(c:Cubicle)-[:LOCATED_IN]->(f:Floor)<-[:LOCATED_IN]-(r:Room)
WHERE p.name IN ["Max", "Alex", "Andrew"]
WITH r, occupied
OPTIONAL MATCH (r)-[:IS_BOOKED_ON_2018_03_20]->(m:Meeting)
WITH r, [{start_time:1521565200000, end_time:1521534600000}] + COLLECT(m) + occupied AS meetings
UNWIND meetings AS m
WITH r, [min(m.start_time), max(m.end_time)] AS rslot, COLLECT(m) AS mm
WITH r, rslot, mm
UNWIND mm AS m1
UNWIND mm AS m2
WITH r, rslot, m1, m2 WHERE (m2.start_time >= m1.end_time)
WITH r, rslot, [m1.end_time, min(m2.start_time)] AS slot
ORDER BY slot[0]
WITH r, [[1521534600000, rslot[0]]] + collect(slot) + [[rslot[1], 1521565200000]] AS open
WITH r, filter(x IN open WHERE x[0]<>x[1]) AS available
UNWIND available AS dups
WITH r, COLLECT(DISTINCT dups) AS tslots
RETURN r.name AS Room , EXTRACT (x IN tslots | apoc.date.format(x[0],'ms','HH:mm') + ' to ' + apoc.date.format(x[1],'ms','HH:mm')) AS Available
ORDER BY r.name

…and there we go:

╒════════╤═══════════════════════════════════╕
│"Room" │"Available" │
╞════════╪═══════════════════════════════════╡
│"Room 1"│["09:30 to 11:00","16:00 to 17:00"]│
├────────┼───────────────────────────────────┤
│"Room 2"│["11:00 to 13:00","14:00 to 15:00"]│
├────────┼───────────────────────────────────┤
│"Room 3"│["08:30 to 15:00","16:00 to 17:00"]│
├────────┼───────────────────────────────────┤
│"Room 4"│["08:30 to 15:00","16:00 to 17:00"]│
└────────┴───────────────────────────────────┘

Pretty neat, right? To be totally honest, I didn’t come up with this query by myself. I had a ton of help from Alex Price and Andrew Bowman.

I asked Michael Hunger, and he had another idea: ordering the meeting times and using lists and ranges instead of a double unwind to get the same answer. Here, he is also using apoc.date.parse(‘2018-03-20 08:30:00’) instead of 1521534600000 to make the query more readable. Yes, these dates will be much nicer to work with in Neo4j 3.4… I can’t wait, either.

MATCH (p:Person)
WHERE p.name IN ["Max", "Alex", "Andrew"]
OPTIONAL MATCH (p)-[:HAS_MEETING_ON_2018_03_20]->(m:Meeting)
WITH COLLECT(m) AS occupied
MATCH (p:Person)‐[:SITS_IN]‐>(c:Cubicle)‐[:LOCATED_IN]‐>(f:Floor)<‐[:LOCATED_IN]‐(r:Room)
WHERE p.name IN ["Max", "Alex", "Andrew"]
WITH DISTINCT r, occupied
OPTIONAL MATCH (r)‐[:IS_BOOKED_ON_2018_03_20]‐>(m:Meeting)
WITH r, occupied + COLLECT(m {.start_time, .end_time}) AS meetings
UNWIND meetings AS m
WITH r, m order by m.start_time
WITH r, COLLECT(m) as meetings
WITH r,meetings, {end_time:apoc.date.parse('2018-03-20 08:30:00')} + meetings + {start_time:apoc.date.parse('2018-03-20 17:00:00')} AS bookedSlots
WITH r, meetings,[idx in range(0,size(bookedSlots)-2) | {start_time:(bookedSlots[idx]).end_time,end_time:(bookedSlots[idx+1]).start_time}] as allSlots
WITH r, meetings,[slot IN allSlots WHERE slot.end_time - slot.start_time > 10*60*1000] as openSlots
WITH r, [slot IN openSlots WHERE NONE(m IN meetings WHERE slot.start_time < m.start_time < slot.end_time OR slot.start_time < m.end_time < slot.end_time)] as freeSlots
RETURN r, [slot IN freeSlots | apoc.date.format(slot.start_time,'ms','HH:mm')+" to "+apoc.date.format(slot.end_time,'ms','HH:mm')] as free
ORDER BY r.name;

If you want expert help with your Cypher queries (and anything else Neo4j), be sure to join our Neo4j Users Slack Group, where over 7,500 Neo4j users hang out.

Original Link

Theo 4.0 Release: The Swift Driver for Neo4j

Last week, I wrote about Graph Gopher, the Neo4j client for iPhone. I mentioned that it was built alongside version 4.0 of Theo, the Swift language driver for Neo4j. Today, we’ll explore the Theo 4.0 update in more detail.

But before we dive into the Theo update, let’s have a look at what Theo looks like with a few common code examples:

Instantiating TheoCreating a Node and Getting the Newly Created Node Back, Complete With Error Handling

Looking Up a Node by ID, Including Error Handling and Handling if the Node Was Not FoundPerforming a Cypher Query and Getting the ResultsPerforming a Cypher Query Multiple Times With Different Parameters as Part of a Transaction, Then Rolling It Back

As you can see, it is very much in line with how you would expect Swift code to read, and it integrates with Neo4j very much how you would expect a Neo4j integration to be. So no hard learning curves, meaning you can start being productive right away.

What’s New in Theo 4.0

Now for the update story:

Theo 4.0 had a few goals:

  • Make a results-oriented API
  • Support Swift 4.0
  • Remove REST support

Theo 3.1 was our first version to support Bolt, and while it has matured since then, it turned out to be very stable, memory-efficient and fast right out of the gate.

We learned from using Theo 3 that a completion-block-based API that could throw exceptions, while easy to reason about, could be rather verbose, especially for doing many tasks in a transaction. For version 4, we explored – and ultimately decided upon – a Result type-based API.

That means that a request would still include a completion block, but it would be called with a Result type that would contain either the values successfully queried for, or an error describing the failure.

Theo 3 having a throwing function with a regular completion block.Theo 4, same example, but now with a Result type in the completion block instead.

This allowed us to add parsing that matched each query directly, and thus the code using the driver could delete the result parsing. For our example project, Theo-example, the result was a lot of less code. That means less code to debug and maintain.

Theo-example connection screen.Theo-example main screen.

Theo 3.2 added Swift 4 support, in addition to Swift 3. In Theo 4, the main purpose of this release – other than to incorporate the improvements done on the Bolt implementation – was that Theo 4 would remove the REST client that by 3.2 was marked as deprecated.

Having Theo 3.2 compatible with Swift 4 meant that projects using the REST client could use this as a target for a while going forward, giving them plenty of time to update. We committed to keeping this branch alive until Swift 5 arrived.

The main reason to remove the REST client was that the legacy Cypher HTTP endpoint it was using has been deprecated. This was the endpoint Theo 1 had been built around. Bolt is the preferred way for drivers, and hence it made little sense to adapt the REST client to the transactional Cypher HTTP endpoint that succeeds the legacy Cypher HTTP endpoint.

The result of these changes is an API that is really powerful, yet easy to use. The developer feedback we’ve gotten so far has been very positive. Theo 4 was in beta for a very long time and is now mature enough that we use it in our own products, such as Graph Gopher.

Going forward with Theo 4, the main plan is bugfixes, ensure support for new Neo4j versions, and minor improvements based on community input.

Looking Forward to Theo 5.0

The next exciting part will be Theo 5, which will start taking shape when Swift 5 is nearing ready.

The next major API change will be when Swift updates its concurrency model so that the API will stay close to the recommended Swift style. Specifically, we are hoping that Swift 5 will bring an async-await style concurrency model that we would then adapt to Theo. But it may very well be that this will have to wait until later Swift versions.

Other Ways to Connect to Neo4j Using the Swift Programming Language

If you think Theo is holding your hands too much, you can use Bolt directly through the Bolt-Swift project. The API is fairly straightforward to use, and hey, if you need an example project you can always browse the Theo source code. 

Another interesting project to come out of Theo and Bolt support is PackStream-Swift. PackStream is the format that Bolt uses to serialize objects, in a way similar to the more popular MessagePack protocol. So if you simply need a way to archive your data or communicate them across another protocol than Bolt, perhaps PackStream will fit your needs.

Give Us Your Feedback!

You can ask questions about Theo both on Stack Overflow (preferably) or in the #neo4j-swift channel in the neo4j-users Slack.

If you find issues, we’d love a pull request with a proposed solution. But even if you do not have a solution, please file an issue on the GitHub page.

We hope you enjoy using Theo 4.0!

Original Link

This Week in Neo4j: Property Based Access Control, Cypher, and User Path Analysis

Welcome to this week in Neo4j where we round up what’s been happening in the world of graph databases in the last 7 days.

This week we have a sneak peek at property based access control in Neo4j 3.4, user path analysis with Snowplow analytics, resources to get started with the Cypher query language, and more!

This week’s featured community member is Iryna Feuerstein, Software Engineer at PRODYNA – Neo4j Partner and sponsor of the GraphTour.

Iryna has been part of the Neo4j community for several years, is the organizer of the Düsseldorf Neo4j Meetup group, and has given a number of talks and workshops on Neo4j around the German-speaking region.

This week Iryna gave an introduction to Neo4j for kids at the JavaLand conference and a talk on modeling and importing each paragraph and section of the German laws into the graph.

Iryna’s work on importing and querying the Comparative Toxicogenomics Database is really interesting too in relating environmental factors to human health. She will give a workshop on this topic on May 25 in Berlin.

On behalf of the Neo4j community, thanks for all your work Iryna!

Keeping Properties Secret in Neo4j

We are frequently asked how to do property based access control in Neo4j and Max De Marzi has written a post in which he gives a sneak peak of this feature which will be released in Neo4j 3.4.

Keeping properties secret in Neo4j

Max shows us how this works by going through an example based on node properties indicating the existence (or not!) of aliens. You can download an alpha version of Neo4j that has this feature from the other releases page of neo4j.com

Intro to Cypher

This week we have a couple of excellent resources for getting started with the graph query language Cypher.

In Big Data analytics with Neo4j and Java, Part 1 Steven Haines shows how to model a social network in MySQL and Neo4j using examples from the Neo4j In Action book.

He shows how to create and query a social graph of his family and their friends, with detailed explanations of Cypher’s CREATE and MATCH clauses.

If you prefer video content Esteve Serra Clavera released the Cypher Syntax part of his Introduction to Neo4j online course.

Neo4j-GraphQL, Extending R for Neo4j, Indie Music Network

On the Podcast: Dilyan Damyanov

This week on the podcast Rik interviewedDilyan Damyanov, Data Scientist at Snowplow Analytics.

They talk about Dilyan’s work doing path analysis and how Snowplow have been able to use graphs to track people moving through the different stages of a marketing funnel and work out which marketing twitch causes them to convert.

Dilyan also presented at the Neo4j Online Meetup where he showed how to write Cypher queries that enable this kind of analysis.

Next Week

What’s happening next week in the world of graph databases?

Tweet of the Week

My favourite tweet this week was by Daniel Gallagher:

View image on Twitter

Gallagher@DanielGallagher

Today I made the switch to Neo4j to feed @Graphistry. The natural ability to be able to draw inferred user relationships simply off of tweet interaction is awesome!

I thought I had done something wrong here, but this led me directly to an account that is a weird anomaly… ��

129

55 people are talking about this

Don’t forget to RT if you liked it too.

Original Link

This Week in Neo4j: JavaScript CRUD Apps, Personalized Recommendation Engines, Graph Theory Tutorial

Welcome to this week in Neo4j, where we round up what’s been happening in the world of graph databases in the last seven days.

This week, we’ve got real-time food and event recommendation engines, a JavaScript OGM, a Neo4j Operational Dashboard, and more!

Featured Community Member: Meredith Broussard

This week’s featured community member is Meredith Broussard, Assistant Professor at New York University, with a focus on data-driven reporting, computational journalism, and data visualization.

Meredith has presented Neo4j workshops at NICAR 2017, showing attendees how to find connections in campaign finance data, and again in 2018, this time with a focus on social network analysis.

On behalf of the Neo4j and data journalism communities, thanks for all your work Meredith!

Recommendation Engines for Food Recipes and Events

This week, we have two stories about real-time recommendation engines: a use case where graph databases excel.

Irene Iriarte Carretero, last week’s featured community member, was interviewed by diginomica after her GraphTour London talk last week.

Irene explains how Gousto is using Neo4j to build a personalized recipe recommendation engine that takes “the subjective aspect” of cooking into account.

Suprfanz’s Jennifer Webb presented Data science in practice: Examining events in social media at the Strata Data Conference in San Jose.

In the talk, Jennifer shows how to build a recommendation engine for event promoters, starting from the community graph and using graph algorithms to find influencers. You can download the slides from Jennifer’s talk.

Neo4j Operational Dashboard, JavaScript OGM, Graphs for Identity

Geek Out: Graph Theory Tutorial

I came across Michel Caradec’s excellent workshop about implementing graph theory with Neo4j.

Michel set himself the challenge of implementing graph theory concepts using pure Cypher, and in the tutorial, he shows how to create random graphs, extract subgraphs, generate adjacency matrices, and more.

If you geek out on graph theory, you’re going to love this tutorial.

Tweet of the Week

Image title

That’s all for this week!

Original Link

Graph Gopher: The Neo4j Browser Built on Swift for Your iOS Device

Graph Gopher is a Neo4j browser for iPhone that was recently released to the App Store.

Graph Gopher lets you interact natively with your Neo4j graphs through easy browsing and quick entry for new nodes and relationships. It gives you a full Cypher client at your fingertips and fast editing of your existing data.

Start by browsing labeled nodes or relationships or their property keys.A full Cypher client ready at your fingertips:Quickly add new nodes and relationships.Easily edit nodes and relationships.See what relationships share a common property key, in this case, createdDate

How Graph Gopher Got Started

Graph Gopher came out of a few questions I explored. First of all, I was exploring different ways to browse the graphs stored in my Neo4j graph database. The graph visualization of a Cypher query approach we know from the Neo4j web interface was an alternative, but I thought it required quite a bit of the user to start exploring it, and it was perhaps not as good a fit on a phone-sized device.

After spending a lot of time trying to adapt that, I found that the classic navigation interface was one I thought worked well for exploring the graph. To me, the navigation interface looks a lot like Gopher, the navigation paradigm we used to explore the internet before web browsers, and hence the name was born.

Building Graph Gopher in Swift

The second road to Graph Gopher was that Swift – a language used to write iOS apps – had become open source, and it was starting to be used to write server applications. While databases like MySQL and SQLite were available and used by many, Neo4j was absent.

I knew I could do something about that, and joined Cory Wiles’s Theo project in late 2016. After completing the Swift 3.0 transition together with him, I implemented Bolt support for 3.1 and 3.2.

For version 4.0, I improved the API, made it support Swift 4, and made it a lot easier to use. I used the development of Graph Gopher to validate the work done there, and Graph Gopher is a great demonstration of what you can do with Theo. Along the way, other developers started using the betas of Theo 4, giving me great feedback.

Faster Than the Neo4j Browser and Available Wherever You Need It

An ambition for Graph Gopher was to be way faster to load and use than loading up the web interface in a browser tab and interacting with your Neo4j instance that way. In practice it has been no match: it is a very convenient tool. Even though I use a Mac all through my working day, I still access my Neo4j instances primarily through Graph Gopher.

The exception to this is when I write longer Cypher statements as part of my development work, but I have gotten good feedback on how to improve this. Look forward to updates here in the coming versions.

In practice, Graph Gopher makes it so that you always have your Neo4j instance available to you. It helps you add or edit nodes and relationships, prototype ideas and look up queries from your couch, coming out of the shower, on the train, or wherever you are. That is wonderfully powerful.

Another important feature is multi-device support. I use both an iPhone and an iPad, and I know people will use it on both work and private devices. Therefore it was important to me that session configuration was effortlessly transferred between devices, as well as favorite nodes. This has been implemented using iCloud so that if you add a new instance configuration on one device, it will be available to all devices using the same iCloud account.

Unique to mobile devices is connectivity, and a lot of work was done to help Graph Gopher keep a stable connection over flaky network connections. If the connection still drops, it will reconnect to allow you to continue working where you left off.

The Future of Graph Gopher

The road forward with Graph Gopher will be exciting. Now that it is out, I get contacted by people in situations I hadn’t imagined at all. Where people use it will be the primary driver of what features get added and how it will evolve. I would absolutely love to hear back from you how you use it, or how you would like to use it.

Original Link

Neo4j: A Reasonable RDF Graph Database and Reasoning Engine

It is widely known that Neo4j is able to load and write RDF. Until now, RDF and OWL reasoning have been attributed to fully fledged triple stores or dedicated reasoning engines only. This post shows that Neo4j can be extended by a unique reasoning technology to deliver a very expressive and highly competitive reasoning engine for RDF, RDFS, and OWL 2 RL. I will briefly illustrate the approach and provide some benchmark results.

Labeled property graphs (LPG) and the resource description framework (RDF) have a common ground: both consider data as a graph. Not surprisingly, there are ways of converting one format into the other, as recently demonstrated nicely by Jesús Barrasa from Neo4j for the Thomson Reuters PermID RDF dataset.

If you insist on differences between LPG and RDF, then consider the varying abilities to represent schema information and reasoning.

In Neo4j 2.0, node labels were introduced for typing nodes to optionally encode a lightweight type schema for a graph. Broadly speaking, RDF Schema (RDFS) extends this approach more formally. RDFS allows structuring labels of nodes (called classes in RDF) and relationships (called properties) in hierarchies. On top of this, the Web Ontology Language (OWL) provides a language to express rule-like conditions to automatically derive new facts such as node labels or relationships.

Reasoning Enriches Data With Knowledge

For a quick dive into the world of rules and OWL reasoning, let’s consider the very popular LUBM benchmark (Lehigh University Benchmark).

The benchmark consists of artificially generated graph data in a fictional university domain and deals with people, departments, courses, etc. As an example, a student is derived to be an attendee if he or she takes some course. Thus, when he or she matches the following ontological rule:

Student and (takesCourse some) SubClassOf Attendee

This rule has to be read as follows when translated into LPG lingo: every node with label Student that has some relationship with label takesCourse to some other node will receive the label Attendee. Any experienced Neo4j programmer may rub his or her hands since this rule can be translated straightforwardly into the following Cypher expression:

match (x:Student)-[:takesCourse]->()
set x:Attendee

That is perfectly possible but could become cumbersome in case of deeply nested rules that may also depend on each other. For instance, the Cypher expression misses the subclasses of Student such as  UndergraduateStudent. Strictly speaking, the expression above should therefore read:

match (x)-[:takesCourse]->() where x:Student or x:UndergraduateStudent
set x:Attendee

It’s obviously more convenient to encode such domain knowledge as an ontological rule with the support of an ontology editor such as Protégé and an OWL reasoning engine that takes care of executing them.

Another nice thing about RDFS/OWL is that modeling such knowledge is on a very declarative level that is standardized by W3C. In addition, the OWL language bears some important properties such as soundness and completeness.

For instance, you can never define a non-terminating rule set, and reasoning will instantly identify any conflicting rules. In case of OWL 2 RL, it is furthermore guaranteed that all derivable facts can be derived in polynomial time (theoretical worst case) with respect to the size of the graph.

In practice, performance can vary a lot of course. In case of our Attendee example, a reasoner — regardless of whether a triple store rule engine or Cypher engine — has to loop over the graph nodes with label Student  and check for takesCourse relations.

To tweak performance, one could use dedicated indexes to effectively select nodes with particular relations (resp. relation degree) or labels, as well as use stored procedures. At the end of the day, it seems that this does not scale well: when doubling the data, you double the number of graph reads and writes to compute the consequences of such rules.

The good news is that this is not the end of the story.

Efficient Reasoning for Graph Storage

There is a technology called GraphScale that empowers Neo4j with scalable OWL reasoning. The approach is based on an abstraction refinement technique that builds a compact representation of the graph suitable for in-memory reasoning. Reasoning consequences are then incrementally propagated back to the underlying graph store.

The idea behind GraphScale is based on the observation that entities within a graph often have a similar structure. The GraphScale approach takes advantage of these similarities and computes a condensed version of the original data called an abstraction.

This abstraction is based on equivalence groups of nodes that share a similar structure according to well-defined logical criteria. This technique is proven to be sound and complete for all of RDF, RDFS, and OWL 2 RL.

Learn how the Neo4j graph database (vs. a triple store) performs as a reasonable RDF reasoning engine

Here is an intuitive idea of the approach. Consider the graph above as a fraction of the original data about the university domain in Neo4j. On the right, there is a compact representation of the undergraduate students that take at least some course.

In essence, the derived fact that those students are attendees implicitly holds for all source nodes in the original graph. In other words, there is some one-to-many relationship from derived facts in the compact representation to nodes in the original graph.

Reasoning and Querying Neo4j With GraphScale

Let’s look at some performance results with data of increasing size from the LUBM test suite.

The following chart depicts the time to derive all derivable facts (called materialization) with GraphScale on top of Neo4j (without loading times) with 50, 100, resp. 250 universities. In comparison to other secondary storage systems with reasoning capabilities, it occurs that the Neo4j-GraphScale duo shows a much lower growth ratio in reasoning time with increasing data than any other system (schema and data files can be found at the bottom of this post).

A benchmark of GraphScale + Neo4j using the LUBM test suite

Experience has shown that materialization is key to efficient querying in a real-world setting. Without upfront materialization, a reasoning-aware triple store has to temporarily derive all answers and relevant facts for every single query on demand. Consequently, this comes with a performance penalty and typically fails on non-trivial rule sets.

Since the Neo4j graph database is not a triple store, it is not equipped with a SPARQL query engine. However, Neo4j offers Cypher and for many semantic applications, it should be possible to translate SPARQL to Cypher queries.

From a user perspective, this integrates two technologies into one platform: a transactional graph analytics system as well as an RDFS/OWL reasoning engine able to service sophisticated semantic applications via Cypher over a materialized graph in Neo4j.

As a proof of concept, let’s consider SPARQL query number nine from the LUBM test suite that turned out to be one of the most challenging out of the 14 given queries. The query asks for students and their advisors which teach courses taken by those students: a triangular relationship pattern over most of the dataset:

SELECT ?X ?Y ?Z { ?X rdf:type Student . ?Y rdf:type Faculty . ?Z rdf:type Course . ?X advisor ?Y . ?Y teacherOf ?Z . ?X takesCourse ?Z
}

Under the assumption of a fully materialized graph, this SPARQL query translates into the following Cypher query:

MATCH (x:Student)-[:takesCourse]->(z:Course), (x)-[:advisor]->(y:Faculty)-[:teacherOf]->(z)
RETURN x, y, z

Without a doubt, the Neo4j Cypher engine delivers a competitive query performance with the previous datasets (times for resp. count(*) version of query nine). Triple store A is not listed since it is a pure in-memory system without secondary storage persistence.

Benchmark data between Neo4j + Cypher + GraphScale vs. a triple storeThere is more potential in the marriage of Neo4j and the GraphScale technology. In fact, the graph abstraction can be very helpful as an index for query answering. For instance, you can instantly read from the abstraction whether there are some data matching query patterns of kind (x:)-[:]->().

Bottom line: I fully agree with George Anadiotis’ statement that labeled property graphs and RDF/OWL are close relatives.

In a follow-up blog post, I will present an interactive visual exploration and querying tool for RDF graphs that utilizes the compact representation described above as an index to deliver a distinguished user experience and performance on large graphs.

Resources

GraphScale:

  • GraphScale: Adding Expressive Reasoning to Semantic Data Stores. Demo Proceedings of the 14th International Semantic Web Conference (ISWC 2015): http://ceur-ws.org/Vol-1486/paper_117.pdf
  • Abstraction refinement for scalable type reasoning in ontology-based data repositories: EP 2 966 600 A1 & US 2016/0004965 A1

Data:

Original Link

Mixing Specified and Unspecified Group Belongings in a Single Import Isn’t Supported

I’ve been working with the Neo4j Import Tool recently after a bit of a break and ran into an interesting error message that I initially didn’t understand.

I had some CSV files containing nodes that I wanted to import into Neo4j. Their contents look like this:

$ cat people_header.csv name:ID(Person) $ cat people.csv "Mark" "Michael" "Ryan" "Will" "Jennifer" "Karin" $ cat companies_header.csv name:ID(Company) $ cat companies.csv "Neo4j"

I find it easier to use separate header files because I often make typos with my column names and it’s easier to update a single line file than to open a multi-million line file and change the first line.

I ran the following command to create a new Neo4j database from these files:

$ ./bin/neo4j-admin import \
--database=blog.db \
--mode=csv \
--nodes:Person people_header.csv,people.csv \
--nodes:Company companies_heade.csv,companies.csv

Which resulted in this error message:

Neo4j version: 3.3.3
Importing the contents of these files into /Users/markneedham/Library/Application Support/Neo4j Desktop/Application/neo4jDatabases/database-b59e33d5-2060-4a5d-bdb8-0b9f6dc919fa/installation-3.3.3/data/databases/blog.db:
Nodes: :Person /Users/markneedham/Library/Application Support/Neo4j Desktop/Application/neo4jDatabases/database-b59e33d5-2060-4a5d-bdb8-0b9f6dc919fa/installation-3.3.3/people_header.csv /Users/markneedham/Library/Application Support/Neo4j Desktop/Application/neo4jDatabases/database-b59e33d5-2060-4a5d-bdb8-0b9f6dc919fa/installation-3.3.3/people.csv :Company /Users/markneedham/Library/Application Support/Neo4j Desktop/Application/neo4jDatabases/database-b59e33d5-2060-4a5d-bdb8-0b9f6dc919fa/installation-3.3.3/companies.csv ... Import error: Mixing specified and unspecified group belongings in a single import isn't supported
Caused by:Mixing specified and unspecified group belongings in a single import isn't supported
java.lang.IllegalStateException: Mixing specified and unspecified group belongings in a single import isn't supported
at org.neo4j.unsafe.impl.batchimport.input.Groups.getOrCreate(Groups.java:52)
at org.neo4j.unsafe.impl.batchimport.input.csv.InputNodeDeserialization.initialize(InputNodeDeserialization.java:60)
at org.neo4j.unsafe.impl.batchimport.input.csv.InputEntityDeserializer.initialize(InputEntityDeserializer.java:68)
at org.neo4j.unsafe.impl.batchimport.input.csv.ParallelInputEntityDeserializer.lambda$new$0(ParallelInputEntityDeserializer.java:104)
at org.neo4j.unsafe.impl.batchimport.staging.TicketedProcessing.lambda$submit$1(TicketedProcessing.java:103)
at org.neo4j.unsafe.impl.batchimport.executor.DynamicTaskExecutor$Processor.run(DynamicTaskExecutor.java:237)

The output actually helpfully indicates which files it’s importing from and we can see under the :Company section that the header file is missing.

As a result of the typo Ithat  made when trying to type companies_header.csv, the tool now treats the first line of companies.csv as the header and since we haven’t specified a group (i.e. Company, Person) on that line we receive this error.

Let’s fix the typo and try again:

$ ./bin/neo4j-admin import \
--database=blog.db \
--mode=csv \
--nodes:Person people_header.csv,people.csv \
--nodes:Company companies_header.csv,companies.csv Neo4j version: 3.3.3
Importing the contents of these files into /Users/markneedham/Library/Application Support/Neo4j Desktop/Application/neo4jDatabases/database-b59e33d5-2060-4a5d-bdb8-0b9f6dc919fa/installation-3.3.3/data/databases/blog.db:
Nodes: :Person /Users/markneedham/Library/Application Support/Neo4j Desktop/Application/neo4jDatabases/database-b59e33d5-2060-4a5d-bdb8-0b9f6dc919fa/installation-3.3.3/people_header.csv /Users/markneedham/Library/Application Support/Neo4j Desktop/Application/neo4jDatabases/database-b59e33d5-2060-4a5d-bdb8-0b9f6dc919fa/installation-3.3.3/people.csv :Company /Users/markneedham/Library/Application Support/Neo4j Desktop/Application/neo4jDatabases/database-b59e33d5-2060-4a5d-bdb8-0b9f6dc919fa/installation-3.3.3/companies_header.csv /Users/markneedham/Library/Application Support/Neo4j Desktop/Application/neo4jDatabases/database-b59e33d5-2060-4a5d-bdb8-0b9f6dc919fa/installation-3.3.3/companies.csv ... IMPORT DONE in 1s 5ms. Imported: 7 nodes 0 relationships 7 properties
Peak memory usage: 480.00 MB

Success!

Original Link

Exploring Azure Cosmos DB With Gremlin Graph Database

In this article, we will learn about Azure Cosmos DB. We’ll get started with creating an Azure Cosmos DB account with the Graph API. Also, we’ll create a graph database and collections using Azure Portal.

Prerequisite

  • Microsoft Azure subscription

Overview

Azure Cosmos DB is Microsoft’s globally distributed, multi-modal database with scale at the click of a button. Because it’s multi-modal, it supports document, table, and graph values together in a single database.

Azure Cosmos DB provides excellent throughput, latency, availability, and consistency guarantees with comprehensive service-level agreements (SLAs), which, as of now, no other database provider is offering.

Below are the key features of Azure Cosmos DB:

  • Turnkey global distribution.
  • Multi-level data models.
  • Scale on demand.
  • High response with lower latency.
  • High availability.
  • Consistency models.
  • Money-back guarantee with SLA.

Cases that are highly distributed, mission-critical, and highly available can be best handled by having Azure Cosmos DB as one of its application architecture components.

Following are a few use cases recommend by experts:

  • IoT
  • Gaming
  • Retailing
  • Marketing

Creating Cosmos DB Using Azure Portal

Open Microsoft Azure Portal. Click + Create New resource > Database > Cosmos DB.

Alternatively, type cosmos db in search text box and click Azure Cosmos DB under services listed as a result.

Image title

Enter the following required details to create Cosmos DB:

  • ID: This needs to be globally unique. This ID followed by https://documents.azure.com/ presents a URI to play with the service. For this blog, the ID is azure-cosmos-demo. After Amazon Cloud Service is handled, the URI will be https://azure-cosmos-demo.documents.azure.com/.
  • API: This is the most important section to be selected. At the time of writing, Cosmos DB supports this five the following five databases, which can be seen in the following image:
  1. SQL

  2. MongoDB

  3. Cassandra

  4. Azure Table

  5. Gremlin (graph)

Image title

For this blog, we will select Gremlin (graph).

  • Subscription: Here, you can select any of your multiple subscriptions. For this article, we will select the subscription Free Trial.
  • Resource Group: Cosmos DB is part of Azure resources, and hence it needs to be clubbed in with a logical group: the resource group. As per your requirements, select an existing resource or create a new one. For this article, we are creating new resource group: cosmosdb.
  • Location: Select as per your predicted user’s geolocation. For this article, we selected East US.

We’re not checking geo-redundancy, as Cosmos DB has a cool feature of replicating the database at any geolocation upon a click. We’ll look at this in another article.

Pin it to the dashboard and click Create to start the deployment of your service.

Image title

We can trace the deployment status in the dashboard. It may take fewer minutes to get deployed.

Image title

Once deployed, a notification will pop up and open up the “quick start” section.

Image title

Note: It still refers to the service as Microsoft.DocumentDB.

Navigate to the resource: azure-cosmos-demo > Overview.

We can have the URI details, read and write location, subscription, resource group, and many other important details here in this section.

We can enable geo-redundancy here. Also, the data explorer where we can play with the graph database can be navigated to under this section. Refer to the below image and the highlighted area with a green marking.

Image title

Another important parameter used for communication with any Azure resources and Azure CosmosDB are keys.

Cosmos DB comes with two types of keys:

  1. Read-write keys: The URI remains the same — primary key, secondary key, primary connection string, and secondary connection string. We can regenerate these keys at any point of time.

    Image title

  2. Read-only keys: Again with the same default URI, it presents primary key, secondary key, primary connection string, and secondary connection string. We can regenerate these keys at any point of time.

    Image title

Consistency

The session is the default consistency for Azure Cosmos DB.

Session consistency is the most widely used consistency level for both single-region and globally distributed applications. It provides write latencies, availability, and read throughput comparable to that of eventual consistency but also provides consistency guarantees that suit the needs of applications written to operate in the context of the user.

Image title

You can change the consistency by navigating to Resource > Settings > Default Consistency > Save.

Once updated, it will be notified. It will take a few seconds to update.

For the demo, select EVENTUAL.

Image title

Creating Graph Database

Once done creating AN Azure Cosmos DB account with Graph as THE API, let’s create our graph database.

For creating a graph database, navigate to Resource > Data Explorer > New Graph.

Image title

It will present us with a blade to enter the following details with respect to the graph database:

  • Database ID: A database is logical container of one or more collections. Its an identifier for the database. For this article, we will name it demoDatabase.
  • Graph ID: A unique identifier for collections under the database. It’s also used for ID-based routing through REST and all SDKs. For this article, we will name it demoCollections.
  • Storage capacity: This is maximum storage size of the collection. Billing is based on consumptions per GB. We can select it from 10 GB fixed to unlimited. Let’s go with 10 GB fixed.
  • Throughput: Collections can be provisioned throughputs in request units/second (RU/s). 1 RU corresponds to throughput of a read of a 1 KB document. We can select this in the range of 400-10,000 RU/s. Let’s go with a minimum of 400 RU/s. It also displays the estimated hourly cost and daily basis as per the selected throughput.

Note: If you select the storage capacity as unlimited, you can use throughput in the 1,000-100,000 RU/s range. Also, you need to set the partition key for the same.

Once done with entering all fields, click OK to proceed.

Image title

In a few seconds, it will create the desired database, which will be listed under Data Explorer.

Image title

Summary

In this article, we learned how quickly we can create a super cool database with Microsoft Azure CosmosDB. We will create vertices and edges to our graph database in upcoming articles.

Original Link

Visualizing the Amazon Neptune Database With KeyLines

In November 2017, Amazon launched a limited preview of Amazon Neptune, a hosted graph database service with an engine optimized for storing billions of relationships and querying the graph with milliseconds of latency. This new service lets developers focus more on their applications and less on database management.

What’s special about Neptune is that it supports two different open standards for describing and querying data:

  1. Gremlin, a graph traversal language from Apache TinkerPop.

  2. Resource Description Framework (RDF) queried with SPARQL, a declarative language based on Semantic Web standards from W3C.

We’re big fans of both approaches, and KeyLines can work with either. So, we thought we’d check out Neptune to see how easily it can be integrated with KeyLines.

Integrating KeyLines With Amazon Neptune

Step 1: Launch Amazon Neptune

Launching the Amazon Neptune database was pretty straightforward thanks to the quick start guide.

Neptune runs inside your own Amazon Virtual Private Cloud (VPC), which you then add to your own Amazon EC2 instance. You manage all that using a launch wizard in the Neptune console.

Once it’s launched, you can configure database options (parameter group, port, cluster name, etc.). In our example, we used the connection endpoint Amazon provides:

neptune-test.cxmaujvq0cze.us-east-1-beta.rds.amazonaws.com 

That’s all we need to know to start using the database instance.

Step 2: Load Some Data

Next, we need to load data into the Neptune database. Your data files have to be in one of the following formats:

  • CSV for the property graph/Gremlin.

  • N-Triples, N-Quads, RDF/XML, or Turtle for RDF/SPARQL.

As we’ve mentioned, there are two different query engines that can be used with Neptune. In this example, we’re showing how to connect to the SPARQL endpoint with /sparql.

We used a movies dataset representing films and the actors in them in turtle format (.ttl), a textual representation of an RDF graph. Here’s what it looks like:

@prefix imdb:http://www.imdb.com/>.
@prefix dbo: <http://dbpedia.org/ontology/>.
@prefix mo: <http://www.movieontology.org/2009/10/01/movieontology.owl#>. <http://imdb.com/movie/Avatar> a mo:Movie; imdb:hasTitle "Avatar"; mo:hasActor <http://imdb.com/actor/Sam_Worthington>; imdb:imageUrl "http://cf1.imgobject.com/posters/374/4bd29ddd017a3c63e8000374/avatar-mid.jpg".
<http://imdb.com/actor/Sam_Worthington> a dbo:Actor; imdb:hasName "Sam Worthington".
<http://imdb.com/movie/Pirates_of_the_Caribbean:_The_Curse_of_the_Black_Pearl> a mo:Movie; imdb:hasTitle "Pirates of the Caribbean: The Curse of the Black Pearl"; mo:hasActor <http://imdb.com/actor/Zoe_Saldana>; imdb:imageUrl "http://cf1.imgobject.com/posters/242/4bc9018b017a3c57fe000242/pirates-of-the-caribbean-the-curse-of-the-black-pearl-mid.jpg".
<http://imdb.com/actor/Zoe_Saldana> a dbo:Actor; imdb:hasName "Zoe Saldana".
<http://imdb.com/movie/Avatar> a mo:Movie; imdb:hasTitle "Avatar"; mo:hasActor <http://imdb.com/actor/Zoe_Saldana>; imdb:imageUrl "http://cf1.imgobject.com/posters/374/4bd29ddd017a3c63e8000374/avatar-mid.jpg".
[...]

Step 3: Send Queries to Amazon Neptune

The next step is to copy the data to an Amazon S3 (Simple Storage Service) bucket. It’s important to remember that the S3 bucket must be in the same AWS Region (us-east-1 is the only region available at the time of writing) as the cluster that loads the data.

To run the Neptune loader, at the command line, enter:

curl -X POST \ -H 'Content-Type: application/json' \ http://neptune-test.cxmaujvq0cze.us-east-1-beta.rds.amazonaws.com:8182/loader -d ' { "source" : "s3://camintel-neptune/movies.ttl", "format" : "turtle", "region" : "us-east-1", "failOnError" : "FALSE" }'

Which returns:

{ "status" : "200 OK", "payload" : { "loadId" : "2cafaa88-5cce-43c9-89cd-c1e68f4d0f53" }
}

It’s not the most informative response, but it tells us that our data was successfully loaded.

Now that we have successfully added data into the Neptune instance, we can use a SPARQL query to retrieve information and explore the database. The template query we used was:

curl -X POST --data-binary 'my-query' https://your-neptune-endpoint:8182/sparql

…where 'my-query'  is of the form:

query=prefix mo: <http://www.movieontology.org/2009/10/01/movieontology.owl#> prefix imdb: <http://www.imdb.com/> SELECT DISTINCT ?actor ?title ?img ?name WHERE { <http://imdb.com/movie/The_Matrix> mo:hasActor ?actor; imdb:hasTitle ?title; imdb:imageUrl ?img. ?actor imdb:hasName ?name.}

So, to get back the actors from a given movie (i.e. The Matrix), or to find the movies played by a certain actor (i.e. Gloria Foster), we submitted a query like this:

query=prefix mo: <http://www.movieontology.org/2009/10/01/movieontology.owl#> prefix imdb: <http://www.imdb.com/> SELECT DISTINCT ?movie ?title ?img ?name WHERE {?movie mo:hasActor <http://imdb.com/actor/Gloria_Foster>; imdb:hasTitle ?title; imdb:imageUrl ?img. <http://imdb.com/actor/Gloria_Foster> imdb:hasName ?name.}

Now, let’s parse the data.

Step 4: Format the Data

In our case, we just need to format the JSON data returned from our SPARQL queries into a KeyLines JSON object that details nodes and links. 

In the query example below, we’re requesting the nodes and links connected to a specific node (baseNode). The response we get back — either the actors in a specific movie or the movies a specific actor appeared in — is stored in the object called json. KeyLines nodes and links are created calling makeNode and makeLink.

function makeKeyLinesItems(json, baseNode){ var items = []; items.push(baseNode); if (json.results.bindings) { json.results.bindings.forEach(function(item) { var node = item.actor ? makeNode(item.actor.value, "actor", item) : makeNode(item.movie.value, "movie", item); items.push(node); items.push(makeLink(item, node, baseNode)); }); } return items;
} function makeNode (id, type, item) { var isActor = type === 'actor'; var node = { type: 'node', id, ci: true, e: isActor ? 1 : 2, d: { type: type }, t: isActor ? item.name.value : item.title.value }; return node;
} function makeLink (item, node, baseNode) { const id1 = item.actor ? node.id : baseNode.id; const id2 = item.actor ? baseNode.id : node.id; const id = [id1, id2].sort().join('-'); var link = { type: 'link', id, id1, id2, fc: 'rgba(52,52,52,0.9)', c: 'rgb(0,153,255)', w: 2 }; return link;
};

Step 5: Load the data Into KeyLines and Start Customizing Your App

All we need to do is load our data into the KeyLines chart. Neptune will return our nodes and links with the makeKeyLinesItems() function, which we can easily load into KeyLines using chart.load().

By now, we have a simple working prototype of a graph visualization tool, running on an Amazon Neptune back-end.

It might look fairly basic right now, but it’s easy to get your KeyLines app looking good through customization and styling. 

Original Link

What’s Waiting for You in the Latest Release of the APOC Library [March 2018]

The last release of APOC library was just before GraphConnect New York, and in the meantime, quite a lot of new features made their way into our little standard library.

We also crossed 500 GitHub stars, thanks everyone for giving us a nod!

What’s New in the Latest APOC Release

Image: Warner Bros.

If you haven’t used APOC yet, you have one less excuse: it just became much easier to try. In Neo4j Desktop, just navigate to the Plugins tab of your Manage Database view, and click Install for APOC. Then your database is restarted, and you’re ready to rock.

APOC wouldn’t be where it is today without the countless people contributing, reporting ideas and issues and everyone telling their friends. Please keep up the good work.

I also added a code of conduct and contribution guidelines to APOC, so every contributor feels welcome and safe and also quickly knows how to join our efforts.

For this release again, our friends at LARUS BA did a lot of the work. Besides many bug fixes, Angelo Busato also added S3 URL support, which is really cool. Andrea Santurbano also worked on the HDFS support (read/write).

With these, you can use S3 and HDFS URLs in every procedure that loads data, like apoc.load.json/csv/xml/graphml, apoc.cypher.runFile, etc. Writing to HDFS is possible with all the export functions, like apoc.export.cypher/csv/graphml.

Andrew Bowman worked on a number of improvements around path expanders, including:

  • Added support for repeating sequences of labels and/or rel-types to express more complex paths.
  • Support for known end nodes (instead of end nodes based only on labels).
  • Support for compound labels (such as :Person:Manager).

I also found some time to code and added a bunch of things. 

Aggregation Functions

I wanted to add aggregation functions all the way back to Neo4j 3.2 after Pontus added the capability, but I just never got around to it. Below is one of the patterns that we used to use to get the first (few) elements of a collect, which is quite inefficient because the full collect list is built up even if you’re just interested in the first element:

MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WITH p,m ORDER BY m.released
RETURN p, collect(m)[0] as firstMovie

Now, you can just use:

MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WITH p,m ORDER BY m.released
RETURN p, apoc.agg.first(m) as firstMovie

There are also some more statistics functions, including apoc.agg.statistics, which computes all at once and returns a map with {min,max,sum,median,avg,stdev}. The other statistics functions include:

  • More efficient variants of collect(x)[a..b]
  • apoc.agg.nth, apoc.agg.first, apoc.agg.last, apoc.agg.slice
  • apoc.agg.median(x)
  • apoc.agg.percentiles(x,[0.5,0.9])
  • apoc.agg.product(x)
  • apoc.agg.statistics() provides a full numeric statistic

Indexing

Implemented an idea of my colleague Ryan Boyd to allow indexing of full “documents,” i.e. map-structures per node or relationship that can also contain information from the neighborhood or computed data. Later, those can be searched as keys and values of the indexed data.

MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
WITH p, p {.name, .age, roles:r.roles, movies collect(m.title) } as doc
CALL apoc.index.addNodeMap(p, doc);

Then, later you can search:

CALL apoc.index.nodes('Person','name:K* movies:Matrix roles:Neo');
apoc.index.addNodeMap(node, {map})
apoc.index.addRelationshipMap(node, {map})

As part of that work, I also wanted to add support for deconstructing complex values or structs, such as:

  • apoc.map.values to select the values of a subset of keys into a mixed type list.
  • apoc.coll.elements is used to deconstruct a sublist into typed variables (this can also be done with WITH, but requires an extra declaration of the list to be concise).
RETURN apoc.map.values({a:'foo', b:42, c:true}, ["a","c"]) -> ['foo', true] CALL apoc.coll.elements([42, 'foo', person]) YIELD _1i as answer, _2s as name, _3n as person

Path Expander Sequences

You can now define repeating sequences of node labels or relationship types during expansion. Just use commas in the relationshipFilter and labelFilter config parameters to separate the filters that should apply for each step in the sequence.

relationshipFilter:'OWNS_STOCK_IN>, <MANAGES, LIVES_WITH>|MARRIED_TO>|RELATED'

The above will continue traversing only the given sequence of relationships.

labelFilter:'Person|Investor|-Cleared, Company|>Bank|/Government:Company'

All filter types are allowed in label sequences. The above repeats a sequence of a :Person or :Investor node (but not with a :Cleared label), and then a :Company, :Bank, or :Government:Company node (where :Bank nodes will act as end nodes of an expansion, and :Government:Company nodes will act as end nodes and terminate further expansion).

sequence:'Person|Investor|-Cleared, OWNS_STOCK_IN>, Company|>Bank|/Government:Company, <MANAGES, LIVES_WITH>|MARRIED_TO>|RELATED'

The new sequence config parameter above lets you define both the label filters and relationship filters to use for the repeating sequence (and ignores labelFilter and relationshipFilter if present).

Path Expansion Improvements

  • Compound labels (like Person:Manager) allowed in the label filter, applying only to nodes with all of the given labels.
  • endNodes and terminatorNodes config parameters for supplying a list of the actual nodes that should end each path during expansion (terminatorNodes end further expansion down the path, endNodes allow expansion to continue)
  • For labelFilter, the whitelist symbol + is now optional. Lack of a symbol is interpreted as a whitelisted label.
  • Some minor behavioral changes to the end node > and termination node / filters, specifically when it comes to whitelisting and behavior when below minLevel depth.

Path Functions

(This one came from a request in neo4j.com/slack.)

  • apoc.path.create(startNode, [rels])
  • apoc.path.slice(path, offset, length)
  • apoc.path.combine(path1, path2)
MATCH (a:Person)-[r:ACTED_IN]->(m)
...
MATCH (m)<-[d:DIRECTED]-()
RETURN apoc.path.create(a, r, d) as path MATCH path = (a:Roo)<-[:PARENT_OF*..10]-(leaf)
RETURN apoc.path.slice(path, 2,5) as subPath MATCH firstLeg = shortestPath((start:City)-[:ROAD*..10]-(stop)), secondLeg = shortestPath((stop)-[:ROAD*..10]->(end:City))
RETURN apoc.path.combine(firstLeg, secondLeg) as route

Text Functions

  • apoc.text.code(codepoint), apoc.text.hexCharAt(), apoc.text.charAt() (thanks to Andrew Bowman)
  • apoc.text.bytes/apoc.text.byteCount (thanks to Jonatan for the idea)
  • apoc.text.toCypher(value, {}) for generating valid Cypher representations of nodes, relationships, paths, and values
  • Sørensen-Dice similarity (thanks, Florent Biville)
  • Roman Arabic conversions (thanks, Marcin Cylke)
  • New email and domain extraction functions (thanks, David Allen)

Data Integration

  • Generic XML import with apoc.import.xml() (thanks, Stefan Armbruster)
  • Pass Cypher parameters to apoc.export.csv.query
  • MongoDB integration (thanks, Gleb Belokrys)
  • stream apoc.export.cypher script export back to the client when no file name is given
  • apoc.load.csv
    • Handling of converted null values and/or null columns
    • Explicit nullValues option to define values that will be replaced by null (global and per field)
    • Explicit results option to determine which output columns are provided

Collection Functions

  • apoc.coll.combinations(), apoc.coll.frequencies() (thanks, Andrew)
  • Update/remove/insert value at collection index (thanks, Brad Nussbaum)

Graph Refactoring

  • Per property configurable merge strategy for mergeNodes
  • Means to skip properties for cloneNodes

Other Additions

Other bug fixes in this release of the APOC library include:

  • apoc.load.jdbc (type conversion, connection handling, logging)
  • apoc.refactor.mergeNodes
  • apoc.cypher.run*
  • apoc.schema.properties.distinctCount
  • Composite indexes in Cypher export
  • ElasticSearch integration for ES 6
  • Made larger parts of APOC not require the unrestricted configuration
  • apoc.json.toTree (also config for relationship-name casing)
  • Warmup improvements (dynamic properties, rel-group)
  • Compound index using apoc.schema.assert (thanks, Chris Skardon)
  • Explicit index reads don’t require read-write-user
  • Enable parsing of lists in GraphML import (thanks, Alex Wilson)
  • Change CYPHER_SHELL format from upper case to lower case. (:begin,:commit)
  • Allowed apoc.node.degree() to use untyped directions (thanks, Andrew)

Feedback

As always, we’re very interested in your feedback, so please try out the new APOC releases, and let us know if you like them and if there are any issues.

Please refer to the documentation or ask in neo4j-users Slack in the #neo4j-apoc channel if you have any questions.

Enjoy the new release(s)!

Original Link

DevOps on Graphs: The 5-Minute Interview With Ashley Sun, Software Engineer at LendingClub [Video]

“Basically, anything you can think of in your infrastructure, whether it’s GitHub, Jenkins, AWS, load balancers, Cisco UCS, vCenter – it’s all in our graph database,” said  Ashley Sun, Software Engineer at  LendingClub.

DevOps at LendingClub is no easy feat: Due to the complexities and dependencies of their internal technology infrastructure – including a host of microservices and other applications – it would be easy for everything to spiral out of control. However, graph technology helps them manage and automate every connection and dependency from top to bottom. 

In this week’s five-minute interview (conducted at GraphConnect New York), Ashley Sun discusses how the team at LendingClub uses Neo4j to gain complete visibility into its infrastructure for deployment and release automation and cloud orchestration. The flexibility of the schema makes it easy for LendingClub to add and modify its view so that their graph database is the single up-to-date source for all queries about its release infrastructure.

Talk to us about how you use Neo4j at LendingClub.

Ashley Sun: We are using Neo4j for everything related to managing the complexities of our infrastructure. We are basically scanning all of our infrastructure and loading it all into Neo4j. We’ve written a lot of deployment and release automation, cloud orchestration, and it’s all built around Neo4j. Basically, anything you can think of in your infrastructure, whether it’s GitHub, Jenkins, Amazon Web Services (AWS), load balancers, Cisco Unified Computing System (UCS), vCenter – it’s all in our graph database

We’re constantly scanning and refreshing this information so that at any given time, we can query our graph database and receive real-time, current information on the state of our infrastructure.

What made you choose Neo4j?

Sun: At the time, my manager was looking for a database that we could run ad-hoc queries against, something that was flexible and scalable. He actually looked at a few different graph databases and decided Neo4j was the best. 

Catch this week’s 5-Minute Interview with Ashley Sun, Software Engineer at LendingClub

What have been some of your most interesting or surprising results you’d seen while using Neo4j?

Sun: The coolest thing about Neo4j, for us, has been how flexible and easily scalable it is. If you’ve come from a background of working with the traditional SQL database where schemas have to be predefined — with Neo4j, it’s really easy to build on top of already existing nodes, already existing relationships and already existing properties. It’s really easy to modify things. Also, it’s really, really easy to query at any time using ad-hoc queries. 

We’ve been working with Neo4j for three years, and as our infrastructure has grown and as we’ve added new tools, our graph database has scaled and grown with us and just evolved with us really easily. 

Anything else you’d like to add or say?

Sun: It would be exciting for more tech companies to start using Neo4j to map out their infrastructure and maybe automate deployments and their cloud orchestration using Neo4j. I’d love to about how other tech companies are using Neo4j.

Original Link

This Week in Neo4j: Google Cloud Functions and Visualizing Facebook Events

Welcome to this week in Neo4j where we round up what’s been happening in the world of graph databases in the last 7 days.

Neo4j GraphTour Is Here

The Neo4j GraphTour gets started next week with our first stops in Tel Aviv and Madrid. This is your chance to hear how Neo4j customers are using graphs and learn all about the Graph Platform.

We’ll also be running GraphClinics at each event where you can come and ask Neo4j engineers for one-on-one help with your project and get all your graph related questions answered.

Michael Hunger or I will be at each event and will be presenting Utilizing Powerful Extensions for Analytics & Operations where we’ll show how you how to supercharge your Neo4j experience.

There are still seats available for some of the events but don’t procrastinate too long, register now!

This week’s featured community member is Tim Williamson, Data Scientist and Associate Fellow at Monsanto Company.

Tim Williamson - This Week’s Featured Community Member

Tim has been a member of the Neo4j community for several years now and is a strong advocate for graphs on social media, frequently helping people out with their graph questions.

I first came across Tim during his presentation Graphs Are Feeding The World at GraphConnect SF 2015, which he then followed up with a 5-minute interview. You can also find the slides from Tim’s talk.

Tim also presented Using Graph Databases to Operationalize Insights from Big Data at Strata 2016 with Neo4j CEO Emil Eifrem.

On behalf of the Neo4j community, thanks for all your work Tim!

Online Meetup: Data Science in Practice: Importing and Visualizing Facebook Using Graphs

In this week’s online meetup Ray Barnard and Jen Webb from Suprfanz showed us how to import Facebook events into Neo4j and visualize them using d3.js.

They guided us through a Python-based tutorial which you can find in the suprfanz/flask-fb-neo4j-d3 GitHub repository.

Pick of the Week: Reactome – Efficient Access to Complex Pathway Data

A couple of weeks ago the paper behind Reactome – a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways – was published.

Reactome – Efficient access to complex pathway data

Reactome annotates processes in a consistent pathway model to create a resource for researchers as a core reusable pathway dataset for systems biology approaches. It also provides infrastructure and bioinformatics tools for search, visualization, interpretation, and analysis of pathways

In the paper the authors explain how they were able to use Neo4j and Cypher to greatly improve query efficiency, reducing the average query time by 93% from when they stored the data in MySQL.

On the Podcast: Laura Drummer

This week on the podcast RikinterviewedLaura Drummer, Director of Software & Engineering at Novetta Solutions.

They talk about Laura’s work building social networks of not only people, but also things that they’re talking about. Laura explains how she’s been able to use Python Data Science tools scikit-learn and gensim to build these graphs.

Laura presented the talk Sentiment and Social Network Analysis at GraphConnect NYC 2017 so I’d recommend watching that if you want to learn more.

Next Week

What’s happening next week in the world of graph databases?

Tweet of the Week

My favorite tweet this week was by Bill J. Stidham:

Bill J. Stidham@billstidham

Been playing with this new (to me) database system “Graph Database”, @neo4j implementation. So far, it’s quite interesting. Basically coding in JSON.

10

See Bill J. Stidham’s other Tweets

Don’t forget to RT if you liked it too.

Original Link

Building a Graph Database Wannabe (Part 1)

Last year, Microsoft announced Cosmos DB, a multi-modal database with graph support. I think multi-modal databases are like Swiss army knives — they can do everything, just not very well. I imagine you would design it to be as good as it can be in its main use case while not losing the ability to do other things. So it’s neither fully optimized for its main thing nor very good at the other things. Maybe you can do pretty well with two things by making a few compromises, but if you try to do everything… it’s just not going to work out.

Can you imagine John Rambo stalking his enemies with an oversized Swiss army knife? Here, let me help with the mental image:

Yeah… not super effective, and definitely not nearly as cool. Sometimes, you need a screwdriver and sometimes, you need a scary-looking big ass knife. But instead of looking at the cosmos, let’s focus in on our own solar system.

A few weeks ago, Amazon announced Neptune, their RDF/graph database hybrid. That’s two big players announcing graph-capable databases within months of each other. There are a bunch of small players also bringing graph databases to market.

Yeah, well, I’m gonna go build my own graph database with hook …wait, hold on.

According to database industry analyst Curt Monash, there are two cardinal rules of DBMS development:

Image title

How long have Amazon and Microsoft been working on these databases? How much money do those smaller players have? Hm… yeah, good luck deploying your mission-critical graph database-backed applications on those platforms… you’ll need it. But to be fair, someone had to take a chance on Neo4j once, too.

But forget about them; if I was building a new graph database today, I would start by watching all these videos and actually paying attention this time, then write it in C++ with SeaStar. It would be “in memory” because 99% of all operational databases fit in memory. It would be serializable because, once again, 99% of all operational databases don’t need more speed than a single core of a machine can handle. It would just log write queries instead of changes to the store. So an update one property query and an update 1M properties query would both just write the query, not all the changes. I would avoid locking and latching and try to follow some of the same ideals as VoltDB, as explained by Stonebraker on this video presentation. It’s an hour long, but there is a TLDR version:

However, I don’t have five to seven years to spare, or tens of millions of dollars. But as DHH says, constraints are our friends. So I built Uranus DB, a graph database wannabe, in five to seven days and about ten bucks worth of Diet Dr. Pepper. I say five to seven days because I honestly can’t remember since most of the time I was drugged up on pain meds from my (second) heel surgery. Some people get allergic reactions; I write databases.

I am calling it Uranus because that is the girlfriend of Neptune in the Sailor Moon anime and frankly because I wanted to make the joke that Amazon’s Neptune is just behind Uranus.

Image title

If you are a casual reader, you can probably laugh, stop here, and go do something productive with your life. But if you are trying to decide if I am an idiot or a madman to write thousands of lines of code to make a joke, then continue on.

I am not going to go over the whole thing line by line (you can read the code yourself if you want to see what code on drugs looks like), but I do want to go over some things. In the center, there is a Graph API that looks and feels a lot like the Neo4j Java API. Underneath that is a mess of data structures mostly coming from fastutil. The real workhorse is a ReversibleMultiMap that houses the node relationship combinations. To the side of the Graph API is UranusGraph, a layer that implements the Tinkerpop Graph API so we can run Gremlin on Uranus.

As a side note: Yes, it really only takes about a week to add Gremlin to a database (badly, but tests still pass). It’s really easy to add, and that’s why everybody supports it… the problem is that it takes much longer and it is much harder to learn Gremlin itself. Sure, Marko, Stephen, and a handful of people know it well… but how many genius-level developers work for your company?

On top is an HTTP API that resembles the Neo4j REST API, but not really. A UranusServer running on top of Undertow takes requests and passes them off to a Disruptor that processes each item serially. The first handler persists write queries to a ChronicleQueue (which I would repurpose for replication and recovery if I get around to it) and then the second handler forwards the request to a series of actions which call the Graph API, serialize the answer, and reply. Still with me? Ha!

First, I realized that most of the time, I want quick access to a node by something I know about it. We call these Anchor nodes in our queries, where we parachute into the graph and then radiate outwards looking for whatever pattern we are interested in. So, I made it mandatory that all nodes have a single Label and a Key. For a User node, it may be that the key is their username property, for example. For nodes that don’t have a key, a random GUID would work, but of course, it would no longer be easy to remember how to get to it. Maybe a hash of properties would work… you decide. I am going to go small and just use integers as IDs for my Nodes and Relationships. I could have used longs, but that would have complicated some of the pieces, you’ll see. The Nodes part of the API looks like this:

 int addNode(String label, String key); int addNode(String label, String key, Map<String, Object> properties); boolean removeNode(String label, String key); Map<String, Object> getNode(String label, String key); int getNodeId(String label, String key); Map<String, Object> getNodeById(int id); String getNodeLabel(int id); String getNodeKey(int id);

Naturally, I wanted this concept to extend to Relationships, but they rarely if ever have a natural key. Instead, most of the time, I can point to a relationship by the two nodes it connects and the type. So I made the getRelationship method take a type and the label and keys of both nodes as parameters. This makes sense because most of the time, the relationship between nodes is unique. I FOLLOW this person on Twitter only once. I am a FRIEND of another person just once. We see this manifested in Neo4j by the massive amount of MERGE queries trying not to duplicate relationships between nodes.

But there is one small problem with this plan… what if two nodes are connected by the same relationship type more than once? Can I RATE a Movie more than once? Is that a second relationship or am I updating the first one? I can certainly WATCH a Movie more than once, and I definitely LISTEN to a Song more than once. So my choices are: allow only a single relationship of each type between nodes or simply add a “number” to the relationship API on access. So the Relationship part of the API looks like this:

 int addRelationship(String type, String label1, String from, String label2, String to); int addRelationship(String type, String label1, String from, String label2, String to, Map<String, Object> properties); boolean removeRelationship(int id); boolean removeRelationship(String type, String label1, String from, String label2, String to); boolean removeRelationship(String type, String label1, String from, String label2, String to, int number); Map<String, Object> getRelationship(String type, String label1, String from, String label2, String to); Map<String, Object> getRelationship(String type, String label1, String from, String label2, String to, int number); Map<String, Object> getRelationshipById(int id);

One of the things I like about the Neo4j API that was introduced in the last couple of years is the ability to quickly fetch the degree of a node. In Neo4j if a node has less than 40 relationships, it just counts them. If it’s a “dense” node with 40 or more relationships, we store the count by type in the relationship group chain of each node. So, I added a Node Degree part to the API… and yeah, it is missing the by ID access.

 int getNodeDegree(String label, String key); int getNodeDegree(String label, String key, Direction direction); int getNodeDegree(String label, String key, Direction direction, String type); int getNodeDegree(String label, String key, Direction direction, List<String> types);

For Traversals, sometimes, I want the properties of the relationships I am traversing and other times, I want the relationship IDs. Other times, I simply want the node IDs at the other side of the relationships and sometimes, I want the nodes on the other side. So, my Traversal part of the API looked like:

 List<Map<String, Object>> getOutgoingRelationships(String label1, String from); List<Map<String, Object>> getOutgoingRelationships(int from); List<Map<String, Object>> getOutgoingRelationships(String type, String label1, String from); List<Map<String, Object>> getOutgoingRelationships(String type, int from); List<Integer> getOutgoingRelationshipIds(String label1, String from); List<Integer> getOutgoingRelationshipIds(int from); List<Integer> getOutgoingRelationshipIds(String type, String label1, String from); List<Integer> getOutgoingRelationshipIds(String type, int from); List<Integer> getOutgoingRelationshipNodeIds(String type, String label1, String from); List<Integer> getOutgoingRelationshipNodeIds(String type, Integer from); Object[] getOutgoingRelationshipNodes(String type, String label1, String from); Object[] getIncomingRelationshipNodes(String type, String label2, String to);

Wait. Why are my Relationships Lists of Maps and my Nodes Object Arrays? I probably screwed up there. Anyway, moving on. I also wanted to be able to iterate over all the nodes and relationships (or their IDs) and by type or label… so:

 Iterator<Map<String, Object>> getAllNodes(); Iterator getAllNodeIds(); Iterator<Map<String, Object>> getNodes(String label); Iterator<Map<String, Object>> getAllRelationships(); Iterator getAllRelationshipIds(); Iterator<Map<String, Object>> getRelationships(String type);

Lastly, I wanted to quickly check if two nodes were related by type and direction, so I threw that in there, as well.

 boolean related(String label1, String from, String label2, String to); boolean related(String label1, String from, String label2, String to, Direction direction, String type); boolean related(String label1, String from, String label2, String to, Direction direction, List<String> types);

Let’s look at the implementation and start with the Nodes. I want to be able to get to a node quickly, so I store an ArrayMap of HashMaps of Strings (Object2IntMap) to accomplish this. The outer ArrayMap has the Label of the nodes as the key, and then the keys of the internal HashMaps are the keys of my nodes. The int stored as the value is the ID of the Node I want. The properties of the nodes are stored in an Array as a Map. Once set, the location of a node doesn’t change, any deleted nodes are set to null but kept and a RoaringBitmap keeps track of what nodes are deleted so they can be reused when new nodes are created.

 private Object2ObjectArrayMap<String, Object2IntMap<String>> nodeKeys; private ObjectArrayList<Map<String, Object>> nodes; private RoaringBitmap deletedNodes;

The relationships are a bit more complicated. Remember how I mentioned I wanted to be able to quickly check if a relationship exists between two nodes? But I can have many relationships of the same type between two nodes… so I have a Map that combines the relationship type and count as the key. So the majority of the time they would end in “1” like “FRIEND-1”, but sometimes you would have “LISTEN-1”, “LISTEN-2”, and “LISTEN-3” as keys to keep track of multiple relationships of the same type between nodes. The value is a HashMap of Long2Int. So the key is a long with the from node ID and to node ID (which are both ints) squished into a long, and the value is the actual relationship. Is that insane? Just buy more RAM. I also wanted to keep track of all the relationship counts by type… that one was easy enough:

 private Object2ObjectArrayMap<String, Long2IntOpenHashMap> relationshipKeys; // relKey.put(((long)node1 << 32) + node2, id); <-- squish private Object2IntArrayMap<String> relationshipCounts;

The properties of the relationships are kept in a big Array of Maps just like the nodes, with a matching RoaringBitmap for the deleted ones. To keep track of how many relationships of the same type between two nodes I have, I have relatedCounts which uses the same squish technique as the key and stores the count as the value. I probably could have rolled it into relationshipKeys using a special key like “{type}-count”, but it’s fine.

 private ObjectArrayList<Map<String, Object>> relationships; private Object2ObjectArrayMap<String, Long2IntOpenHashMap> relatedCounts; private RoaringBitmap deletedRelationships;

Now, we get to related. This is a Map of ReversibleMultiMaps. The key is just the relationship type. The value is a lie.

 private Object2ObjectOpenHashMap<String, ReversibleMultiMap> related;

Inside each ReversibleMultiMap are actually four Multimaps. One pair stores the node to node IDs and their reverse; the other pair stores the node to relationship IDs and their reverse. I do this because sometimes I just want node IDs, other times, I want relationship IDs, and sometimes, I need the whole relation object (which fetches it from the relationships array by ID).

 private Multimap<Integer, Integer> from2to = ArrayListMultimap.create(); private Multimap<Integer, Integer> from2rel = ArrayListMultimap.create(); private Multimap<Integer, Integer> to2from = ArrayListMultimap.create(); private Multimap<Integer, Integer> to2rel = ArrayListMultimap.create();

Let’s walk through just two methods, adding a node and adding a relationship to the graph. We start addNode by getting or creating the Object2IntOpenHashMap of the label for our new Node. If this Label-Key combination already exists, we exit with a -1. Otherwise, we add the label and key as “hidden” properties to the node. We then check the deletedNodes bitmap to reuse an ID and find the size of our node array. Using either one as our ID, we include the id as another “hidden” property and insert it into the nodes array. Then we add the label-key-id combination to our nodekeys and return the newly created node ID.

 public int addNode (String label, String key, Map<String, Object> properties) { Object2IntMap<String> nodeKey = getOrCreateNodeKey(label); if (nodeKey.containsKey(key)) { return -1; } else { properties.put("~label", label); properties.put("~key", key); int nodeId; if (deletedNodes.isEmpty()) { nodeId = nodes.size(); properties.put("~id", nodeId); nodes.add(properties); nodeKey.put(key, nodeId); } else { nodeId = deletedNodes.first(); properties.put("~id", nodeId); nodes.set(nodeId, properties); nodeKey.put(key, nodeId); deletedNodes.remove(nodeId); } return nodeId; } } 

In our addRelationship method, we first find the node IDs for the from and to nodes or error out if we can’t find them. We add a new ReversibleMultiMap to related if this is a new relationship type. We increment our relationshipCounts by one. Then, we see how many relationships of this type between these two nodes exist and increment that by one. Then we set the ID by getting the relationship size… we should have checked deleted relationships here, I’ll fix it later. Next, the hidden properties get added and then the properties map gets added to relationships. We update the relatedCount, add our relationship to related ReversibleMultiMap, and add our relationship to relationshipKeys.

public int addRelationship(String type, String label1, String from, String label2, String to) { int node1 = getNodeKeyId(label1, from); int node2 = getNodeKeyId(label2, to); if (node1 == -1 || node2 == -1) { return -1; } related.putIfAbsent(type, new ReversibleMultiMap()); relationshipCounts.putIfAbsent(type, 0); relationshipCounts.put(type, relationshipCounts.getInt(type) + 1); relatedCounts.putIfAbsent(type, new Long2IntOpenHashMap()); Long2IntOpenHashMap relatedCount = relatedCounts.get(type); long countId = ((long)node1 << 32) + node2; int count = relatedCount.get(countId) + 1; int id = relationships.size(); HashMap<String, Object> properties = new HashMap<>(); properties.put("~incoming_node_id", node1); properties.put("~outgoing_node_id", node2); properties.put("~type", type); properties.put("~id", id); relationships.add(properties); relatedCount.put(countId, count); related.get(type).put(node1, node2, id); addRelationshipKeyId(type, count, node1, node2, id); return id; } 

That is a ton of stuff to keep track off. I think the worst thing is deleting a node because you have to delete all its relationships as well. Take a look at the source code if you are interested.

My dog Tyler is right next to me and just farted the nastiest fart. I think this is a good place to stop for now. I’ll continue to Part 2 soon.

Original Link

Retail and Neo4j: Supply Chain Visibility and Management

Now more than ever, supply chains are vast and complex.

Products are often composed of different ingredients or parts that move through different vendors, and each of those parts may be composed of subparts, and the subparts may come from still other subparts and other vendors from various parts of the world.

Because of this complexity, retailers tend to know only their direct suppliers, which can be a problem when it comes to risk and compliance. As supply chains become more complex — and also more regulated — supply chain visibility is more important than ever.

Fortunately, graph technology makes multiple-hop supply chain management simple for retailers and their suppliers.

In this series on Neo4j and retail, we’ll break down the various challenges facing modern retailers and how those challenges are being overcome using graph technology. In our previous posts, we’ve covered personalized promotions and product recommendation engines, customer experience personalization, and e-commerce delivery service routing.

This week, we’ll discuss supply chain visibility and management.

How Neo4j Enables Crystal-Clear Supply Chain Visibility

Retailers need transparency across the entire supply chain in order to detect fraud, contamination, high-risk sites, and unknown product sources.

If a specific raw material is compromised in some way, for example, companies must be able to rapidly identify every product impacted. This requires managing and searching large volumes of data without delay or other performance issues — especially if consumer health or welfare is on the line.

Supply chain transparency is also important for identifying weak points in the supply chain or other single points of failure. If a part or ingredient was previously available from three suppliers but is now only available from one, the retailer needs to know how that might affect future output.

Achieving visibility across the supply chain requires deep connections. A relational database is simply not built to handle a lot of recursive queries or JOINs, and as a result, performance suffers.

graph database, however, is designed to search and analyze connected data. The architecture is built around data relationships first and foremost. This enables retailers and manufacturers to manage and search large volumes of data with no performance issues and achieve the supply chain visibility they need.

Recognizing the inherent risks of the supply chain, Transparency-One sought to build a platform that allows manufacturers and brand owners to learn about, monitor, analyze, and search their supply chain, and to share significant data about production sites and products.

Transparency-One initially considered building the platform on a classic SQL database-type solution. However, the company quickly realized that the volume and structure of information to be processed would have a significant impact on performance and cause considerable problems. So, Transparency-One began looking at graph databases.

Neo4j was the only graph database that could meet Transparency-One’s requirements, including the capacity to manage large volumes of data. Neo4j is also the most widely used graph database in the world, both by large companies and startups.

“We tested Neo4j with dummy data for several thousand products, and there were no performance issues,” says Chris Morrison, CEO of Transparency-One. “As for the search response time, we didn’t have to worry about taking special measures, since we got back results within seconds that we would not have been able to calculate without this solution.”

Using Neo4j, Transparency-One got up and running and built a prototype in less than three months. Since then, the company has extended Neo4j with new modules and the platform is currently deployed by several companies.

Conclusion

With so many partners, suppliers, and end-consumers growing more interconnected than ever before, retailers must have complete end-to-end supply chain visibility in order to proactively address issues in their supply chains — whether it’s a contamination outbreak or a faulty part.

However, when retailers reimagine their data as a graph, they transform a complex problem into a simple one. Furthermore, using graph visualization to manage and oversee complex supply chains allows human managers (and not just algorithms) the ability to instantly pinpoint and fix critical junctures, single points of failure and other problems within the supply network.

In the coming weeks, we’ll take a closer look at other ways retailers are using graph technology to create a sustainable competitive advantage, including revenue management and IT operations.

Original Link

Retail and Neo4j: E-Commerce Delivery Service Routing

As a retailer, if you think keeping up with Amazon is expensive and time-consuming, consider the alternative: extinction.

When it comes to delivery and fulfillment, Amazon is the uncontested emperor of e-commerce. Yet, their efficiency in tracking and delivering orders isn’t a complete secret: graph technology.

Now that off-the-shelf graph database platforms like Neo4j are available to smaller-than-Amazon retailers, it’s time to consider one way you can take back the lead: e-commerce delivery.

In this series on Neo4j and retail, we’ll break down the various challenges facing modern retailers and how those challenges are being overcome using graph technology. In our previous posts, we’ve covered personalized promotions and product recommendation engines as well as customer experience personalization.

This week, we’ll discuss e-commerce delivery service routing.

How Neo4j Transforms E-Commerce Delivery Service Routing

Amazon has set the standard for shipping and delivery. Thanks to its free, two-day shipping for Amazon Prime members, e-commerce shoppers aren’t willing to wait any longer than two days to receive their online purchases. As a result, retailers must meet or beat the standard or risk losing customers to Amazon.

To shorten delivery times, retailers must have visibility into inventory at storefronts and distribution centers, as well as the transit network. They need to know, for example, whether a routing problem could delay a product shipped from a distribution center located closer to the customer or whether a shortage of products makes it impossible to meet a specific delivery date altogether. Identifying the fastest delivery route requires support for complex routing queries at scale with fast and consistent performance.

E-commerce delivery service routing is a natural fit for graph databases given the highly connected nature of the data. It’s not just that it requires a lot of “hops” across data points but also that there can be many different paths with any number of permutations.

Those permutations may be optimized and deemed the best path at different times of the year and for different products, even within a single order. A graph database can take these various factors into account and support complex routing queries to streamline delivery services.

Case Study: eBay

Even before its acquisition by global e-commerce leader eBay, London-based Shutl sought to give people the fastest possible delivery of their online purchases. Customers loved the one-day service and it grew quickly. However, the platform Shutl built to support same-day delivery couldn’t keep up with the exponential growth.

The service platform needed a revamp in order to support the explosive growth in data and new features. The MySQL queries being used created a code base that was too slow and too complex to maintain.

The queries used to select the best courier were simply taking too long, and Shutl needed a solution to maintain a competitive service. The development team believed a graph database could be added to the existing Service-Oriented Architecture (SOA) to solve the performance and scalability challenges.

Neo4j was selected for its flexibility, speed, and ease of use. Its property graph model harmonized with the domain being modeled, and the schema-flexible nature of the database allowed easy extensibility, speeding up development. In addition, it overcame the speed and scalability limitations of the previous solution.

“Our Neo4j solution is literally thousands of times faster than the prior MySQL solution, with queries that require 10-100 times less code. At the same time, Neo4j allowed us to add functionality that was previously not possible,” said Volker Pacher, Senior Developer for eBay.

The Cypher graph query language allowed queries to be expressed in a very compact and intuitive form, speeding development. The team was also able to take advantage of existing code, using a Ruby library for Neo4j that also supports Cypher.

Implementation was completed on schedule in just a year. Queries are now easy and fast. The result is a scalable platform that supports the expansion of the business, including the growth it is now experiencing as the platform behind eBay Now.

Conclusion

Effectively competing with Amazon means your solution needs to be fail-safe, flexible and future-proof. While other technology solutions have narrow use-cases or fixed schemas, Neo4j allows you to evolve your e-commerce delivery platform as variables and circumstances change.

Using the power of graph algorithms that find the shortest path between your fulfillment centers and your customers, routing e-commerce deliveries will be a snap, not a headache.

In the coming weeks, we’ll take a closer look at other ways retailers are using graph technology to create a sustainable competitive advantage, including supply chain visibility, revenue management and IT operations.

&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;div style=&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;quot;display:inline;&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;quot;&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;gt; &amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;img height=&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;quot;1&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;quot; width=&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;quot;1&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;quot; style=&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;quot;border-style:none;&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;quot; alt=&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;quot;&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;quot; src=&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;quot;//googleads.g.doubleclick.net/pagead/viewthroughconversion/960313738/?value=1.00&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;currency_code=USD&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;label=HN1kCN7GnwcQivP0yQM&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;guid=ON&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;script=0&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;quot;/&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;gt; &amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;/div&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;

Original Link

  • 1
  • 2