WHAT IS MNIST ? The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image.It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting. The original black and white ( bilevel ) images from NIST were size normalized to fit in a 20x20 pixel box while preserving their aspect ratio. The resulting images contain grey levels as a result of the anti-aliasing technique used by the normalization algorithm. the images were centered in a 28x28 image by computing the center of mass of the pixels, and translating the image so as to position this point at the center of the 28x28 field.
STRATEGY USED AND WHY ?! : DECISION TREE : It is impossible to calculate all of the features in an image. We therefore use decision trees to simultaneously determine a small collection of informative features and construct a classifier. By only considering a small random sample of queries at each mode we are able to generate multiple, randomized trees that determine a more varied and informative collection of features than is possible with a single tree. The trees, which provide posterior estimates of the class probabilities, are aggregated to produce a stable and robust classifier.^ We analyze the performance of this method and propose several means of augmenting its performance.
HOW IT IS IMPLEMENTED? 1. Necesssary libraries are imported – numpy,scikitlearn,matplotlib,pandas 2. Csv file is loaded and converted into a matrix form 3. Training is done for 21000 items out of 45000 items 4. Testing is done for (45000-21000) items 5. Data will be fit according to 28*28 matrix for the image production 6. Image will be shown on a white background in a black font 7. Prediction will be done by giving an input and getting an image as a output 8. Accuracy will be taken out on the screen
HOW THE ACCURACY CAN BE IMPROVED: Most notably, we introduce a nearest neighbor final test that reduces the already low error rate an additional 20-30%. Testing was done on a subset of a National Institute of Standards and Technology database, and we report a classification rate of 99.6%, comparable to the top results reported elsewhere
SCREENSHOTS AND GITHUB LINK: SCREENSHOTS OF THE RUNNING CODE AND THE RUNNING APPLICATION IS ATTACHED IN THE “HANDWRITNG RECOGNITION CHALLENGE” FOLDER ITSELF!