Your first goal is to use the left/right/disparity images to create a three dimensional point cloud of obstacles. Thereafter, you will have to write an algorithm to track the relative movement of points in the next image, so on and so forth. Once you have figured this out, you can just plug in your sensor inputs into a Monte Carlo algorithm and get an increasingly accurate idea of your position.
From the software standpoint, it's definitely not an easy thing to do - and you're hurting yourself severely by not taking it slowly. Work on the math involved first - your attach paper is good for that - and just get point tracking, then move on to visual odometry. And once that's reliable, you can think about SLAM.