You may find this article on sequence alignment in large text corpora by the folks at ARTFL (http://artfl-project.uchicago.edu/) useful http://www.digitalstudies.org/ojs/index.php/digital_studies/article/view/190/235.
The software ARTFL uses is freely available but its set-up and use remains fairly challenging and has a steep learning curve. Still, the ARTFL team is nothing if not friendly and would, I'm sure, be willing to jump in and assist if you think their tools would be of use for your project as well.