Sitecore & MongoDB: Updating Historical GeoIP

Setup:

We have a Sitecore solution which has been running xDB for several years. This year, we finally got around to implementing a custom GeoIP lookup using the MaxMind libraries. It all worked great! As visitors browsed our website, our code would look up the visitor's geo data based on their IP address, update the interaction in MongoDB, and ultimately it would aggregate all the metrics down to the reporting database - just like it should. But we weren't done quite yet...

The Problem:

The problem came when our analysts took a look at the Experience Analytics. "Why do the reports only show country and city data going back to June?", they asked. "Because that's when we implemented GeoIP lookups", I responded. It seemed like a reasonable answer in the moment, and maybe the analysts would even have been satisfied with it. But of course the perfectionist thing in my brain kept me up all night tossing and turning until...viola...a solution popped into my mind.   

The Solution:

It occurred to me that we actually have all the historical interaction data in MongoDB, including the IP address of the visitor. So the obvious solution is to update the historical interactions in MongoDB with the new GeoIP data, and then perform a reporting database rebuild so the historical data will show in the Experience Analytics reports.

To do this, I had to write a little console app.

The application works like this:

  • connects to our MongoDB instance
  • iterates over all the interactions
  • reads the IP address for each interaction
  • looks up the geo data for the IP address
  • updates the interaction with the geo data
  • saves the interaction back to MongoDB 

Sounds pretty simple, and it actually is. Here is the code:

using MaxMind.GeoIP2;
using MaxMind.GeoIP2.Exceptions;
using Sitecore.Analytics.Model;
using System;
using MaxMindDatabase = MaxMind.Db;
using System.Threading.Tasks;
using MongoDB.Bson;
using MongoDB.Driver;
using System.Net;

namespace MongoDBTool
{
    class Program
    {

        static void Main(string[] args)
        {
            Process(args).Wait();
            Console.ReadLine();
        }

        static async Task Process(string[] args)
        {
            var geoLite2_City_Path = AppDomain.CurrentDomain.BaseDirectory + "GeoLite2-City.mmdb";
            var geoIP2_ISP_Path = AppDomain.CurrentDomain.BaseDirectory + "GeoIP2-ISP.mmdb";
            var conString = "mongodb://localhost:27017";
            var Client = new MongoClient(conString);
            var DB = Client.GetDatabase("analytics");
            var collection = DB.GetCollection<BsonDocument>("Interactions");
            int updated = 0;
            int addressNotFound = 0;
            int errors = 0;

            using (var cityReader = new DatabaseReader(geoLite2_City_Path, MaxMindDatabase.FileAccessMode.Memory))
            using (var ispReader = new DatabaseReader(geoIP2_ISP_Path, MaxMindDatabase.FileAccessMode.Memory))
            {
                await collection.Find(new BsonDocument())
                 .ForEachAsync(x =>
                 {

                     var ip = new IPAddress(x["Ip"].AsByteArray);

                     if (ip != new IPAddress(0))
                     {
                         var geoIp = new WhoIsInformation();

                         try
                         {
                             var city = cityReader.City(ip);
                             if (city != null)
                             {
                                 geoIp.Country = city.Country.IsoCode ?? string.Empty;
                                 geoIp.Region = city.MostSpecificSubdivision.IsoCode ?? string.Empty;
                                 geoIp.City = city.City.Name ?? string.Empty;
                                 geoIp.PostalCode = city.Postal.Code ?? string.Empty;
                                 geoIp.Latitude = city.Location.Latitude;
                                 geoIp.Longitude = city.Location.Longitude;
                                 geoIp.MetroCode = (city.Location.MetroCode == null) ? string.Empty : city.Location.MetroCode.ToString();
                                 geoIp.Dns = city.Traits.Domain ?? string.Empty;
                             }

                             var isp = ispReader.Isp(ip);
                             if (isp != null)
                             {
                                 geoIp.BusinessName = isp.Organization ?? string.Empty;
                                 geoIp.Isp = isp.Isp ?? string.Empty;
                             }

                             var filter = Builders<BsonDocument>.Filter.Eq("_id", x["_id"]);
                             var update = Builders<BsonDocument>.Update
                                 .Set("GeoData.AreaCode", geoIp.AreaCode)
                                 .Set("GeoData.BusinessName", geoIp.BusinessName)
                                 .Set("GeoData.City", geoIp.City)
                                 .Set("GeoData.Country", geoIp.Country)
                                 .Set("GeoData.Dns", geoIp.Dns)
                                 .Set("GeoData.Isp", geoIp.Isp);

                             if (geoIp.Latitude == null)
                             {
                                 update.Set("GeoData.Latitude", BsonNull.Value);
                             }
                             else
                             {
                                 update.Set("GeoData.Latitude", geoIp.Latitude);
                             }

                             if (geoIp.Longitude == null)
                             {
                                 update.Set("GeoData.Longitude", BsonNull.Value);
                             }
                             else
                             {
                                 update.Set("GeoData.Longitude", geoIp.Longitude);
                             }

                             update.Set("GeoData.MetroCode", geoIp.MetroCode)
                                 .Set("GeoData.PostalCode", geoIp.PostalCode)
                                 .Set("GeoData.Region", geoIp.Region)
                                 .Set("GeoData.Url", geoIp.Url);

                             collection.UpdateOne(filter, update);

                             updated++;
                         }
                         catch (MaxMindDatabase.InvalidDatabaseException)
                         {
                             errors++;
                         }
                         catch (AddressNotFoundException)
                         {
                             addressNotFound++;

                             var filter = Builders<BsonDocument>.Filter.Eq("_id", x["_id"]);
                             var update = Builders<BsonDocument>.Update
                                 .Set("GeoData.AreaCode", "N/A")
                                 .Set("GeoData.BusinessName", "N/A")
                                 .Set("GeoData.City", "N/A")
                                 .Set("GeoData.Country", "N/A")
                                 .Set("GeoData.Dns", "N/A")
                                 .Set("GeoData.Isp", "N/A")
                                 .Set("GeoData.Latitude", BsonNull.Value)
                                 .Set("GeoData.Longitude", BsonNull.Value)
                                 .Set("GeoData.MetroCode", "N/A")
                                 .Set("GeoData.PostalCode", "N/A")
                                 .Set("GeoData.Region", "N/A")
                                 .Set("GeoData.Url", "N/A");

                             collection.UpdateOne(filter, update);
                         }
                         catch (GeoIP2Exception)
                         {
                             errors++;
                         }
                     }
                     else
                     {
                         addressNotFound++;
                     }

                     Console.WriteLine(string.Format("Total: {0} Updated: {1} Not Found: {2} Err: {3} IP: {4}", updated + addressNotFound + errors, updated, addressNotFound, errors, ip));

                 });
            }
        }

    }
}



As always, I'm sure there are at least a dozen other ways to accomplish this, and plenty of ways to improve my code as well. Please share your thoughts and ideas!

~David

Add comment

Loading