It was absolutely a blast to be able to present my new machine learning library at CodeMash this year. One of the key goals of the library is to ensure that it is readily accessible to all of its users. Machine learning can often be an intimidating subject with its esoteric terms and complex math. This library is designed to ease the process of feature selection (more on that later) and training. This is obviously a work in progress and any input is welcome (and wanted). If you'd like to get started head on over to the site to learn how to get started using nuML.
Machine learning is a tricky thing so I thought I would mention a couple of things you should know about up front.
Proper prediction depends on a whole slew of factors some of which include:
- Proper feature selection (did you pick the right properties as the input to the training)
- Size of your data set (how many examples are you providing)
- Quality of your data (the diversity of your data)
- Model selection (the discriminative power of the model you selected)
Because of these challenges I suggest you do a fair amount of training to ensure the model functions as you would like. In other words, if the model does not do the right thing all of the time please do not blame me. Computers simply aren’t as smart as we think they should be.
I have done a fair amount of optimization to accommodate large amounts of data. That being said, the system works with memory as its scratch space. Additionally there are a fair amount of calculations that generally happen during the generation of any model. In summary: if you have a lot of data and are running this, it might take a while. Once the model is generated, however, it is generally super-fast at prediction time.
As promised I am providing links to both my supervised and unsupervised talks. If you have any questions feel free to ping me!