Debian packages as Graph Database

I have been playing around with Neo4j as graph database, and searching for a big dataset I decided to look at Debian packages (source and binary) from stable, testing, sid, and experimental, and represent all of that in a big graph database.

While this is far from ready, the following entities and relations a represented:

  • source packages, unversioned and versioned
  • binary packages, unversioned and versioned
  • maintainers
  • all dependencies, including alternatives and versioned dependencies
  • relations like maintains, builds, etc
  • suites (stable, testing, sid, experimental)

The graph currently has 220618 nodes and 782323 edges, and my first trial to import this into the database was by generating a long cypher statement, and then throwing that at cypher-shell. Well, that was not the best idea. After 24h I stopped the process and rewrote the generation script to generate csv files. Using neo4j-import the same amount of data was imported in 5secs (!!!).

What I would like to get in the future is the whole package history as well, and maybe also include all the bugs into the database … if I only would have easily accessible and parseable information about these items (Debian Q&A maybe?). If you have any suggestions, please let me know.

More to come, stay tuned.

3 thoughts on “Debian packages as Graph Database”

