Migrating Data from RDBMS to Graph Database

vteam #679 has been working on a client’s web application of data analysis. This application is able to sift through the vast amounts of information available to its users through social media, news and blogs and identify trends and patterns. In the , graph database was implemented using Neo4j and Python.

Now the client wanted vteams engineer Usman Maqbool to migrate the social media data from relational database to graph database. Reason being that the social media data was taking too much time while working with relations as more than 145 millions of records existed in the database.

Solution

The data was getting larger day by day and since the RDBMS was taking a lot of time to run the query of relations for such a huge data, a solution (that works fast with relations as compared to RDBMS) to use graph database (Neo4j) was proposed.

To apply the solution, Usman had to migrate the data from RDBMS to Neo4j and implement Graph Database in the current live running project.

Migration

The migration of live data from RDBMS to Neo4j wasn’t possible, so it was decided to migrate the data in chunks using celery (v:3.1.23). To install via the Python Package Index (PyPI), run the following command:

Once the installation is done, add the following code in file:

Now, add the following migration script in file:

To connect Cypher and Python, py2neo library was used. In the above mentioned script, the data has been divided into 50 chunks to avoid java heap error.