So this is just an idea. Nicknames can apparently be attached to multiple names. That means that we could have a graph of nicknames and also that a direct list may be tricky. A quick thought on how to approach a solution to this.
A. Graph the Names and their known connections to formal names.
B. Attach probability data to each Node specifying which characteristics trigger that solution (Markovian sort of)
C. Start a Server accepting requests
D. When a request comes in first look for all potential connections from a nickname to a name, if none save this as a new node. Search can be performed as depth first or breadth first and even split into random points on the graph in a more stochastic mannersuch as a random walk (will need to study these more). This could also be done by just choosing multiple random search points.
E. If there is one node, return the formal name
F. If there are multiple names take one of several approaches:
1. Approach is to calculate Bayesian probabilities given specified characteristics and return the strongest match (this is the weakest solution).
2. Approach is to train a neural net with the appropriate kernel (RBF, linear; etc.) and return the result from this net. (This is slow as having a million neural nets in storage seems like a bad idea)
When generating stand alone nodes, it may be possible to use a Levenshtein distance and other characteristics to attach nickname nodes to formal nodes based on a threshold. A clustering algorithm could use the formal name averages as cluster centers and a hard cutoff could specify (e.g. Levenshtein of 1 or 2) could solidify the connection and throw out Type I and Type II error.
Stay tuned while I flesh this post out in this post with actual code. It may just be a proposal for a while.