Crawler For Neo Continues


Part 7 of our Neo4j Crawler project is under construction. This part is a beast. We are going to add Spring Boot, Spring Data, and Neo4j to our project all at once. You kinda have to do that to show a workable version.  But rest assured, I’ll be keeping with the Scrumbucket style and add nothing that isn’t necessary – the minimalist approach.

Let’s answer some of the design decisions before moving on:

Why Neo4j?

At the end of this project we want to show a web of connections between a site’s pages. By nature this is a graph. From a graph we can answer questions like: “What pages reference this page?” and even cooler, “What is the shortest path from pageR to pageZ?”

This could be accomplished with a relational database. However, you would have to make some pretty fancy queries and a lot of code to traverse a network of links. For example, try to implement the A* algorithm on data stored in SQL. Possible, but SQL is more of a barrier than a help. Also, a graphing database is so much faster for this kinda work than SQL and indexes (someday I’ll write an article on why).

Also, this could be accomplished with a document database like Mongo, but barely. The result would be even worse than a relational database.

This whole project set screams “Graph!” The most popular and well supported graphing database is Neo4j. Besides it’s well supported in the Spring Framework.

Why Spring Data?

You can certainly implement this with native calls to Neo4j. (I did do an implementation that way for a consulting firm.) However, Spring Data provides a level of normalization to the API that I find appealing. It looks almost identical if the underlying database is SQL, Mongo, or Neo4j.  And since I use all three it’s just easier to move around.  Plus, if you ever want to present this information on a web site, you would add in Spring MVC and the two play nice together.

Secondly, I just like Spring Data Repositories. They provide simple query methods, paging, and a standard CRUD interface without any code.

Why Spring Boot?

The Spring Framework was a real pain in the butt in the XML days. Over time more of the libraries started providing annotation based configuration, a big improvement. So it went from a pain in the butt to a case of mild hemorrhoids.

Then came Spring Boot and a whole new attitude. Spring’s mantra used to be something like, “you can make anything you if you can provide enough configuration data.” It was as if they were challenging anyone to get a working app. There was some kinda crazy rite of passage to make a web app on your own. Spring’s new attitude is, “we’ll take care of wiring and configuring everything for you, unless you’re determined to override it.” Ah, I can sit comfortably again.

If you’re hesitant about moving to SpringBoot, get over it. We’ve switched all our Spring-based apps over to it and are weaning ourselves from all the left over configuration files. It feels so good.

Stay tuned….


About Author


Steve works with successful software startups and tech companies throughout Silicon Valley. Most recently he has been developing content migration tools for large websites. He has a deep passion for all things software engineering, from design concepts, to team management, to final delivery.

1 Comment

  1. Hello Steve,

    Thank you for reviving the series! You might want to read this before writing about Neo4j vs SQL indexes:

    Sun, Wen, Achille Fokoue, Kavitha Srinivas, Anastasios Kementsietsidis, Gang Hu, and Guotong Xie. 2015. “SQLGraph: An Efficient Relational-Based Property Graph Store.” In , 1887–1901. ACM Press. doi:10.1145/2723372.2723732.