What One Lawyer Learned From a 50-Hour Data Science Bootcamp
With increased focus on the growing role of data science and data analytics in the future of law, I decided that it was high time to learn what all the fuss is about. Initially, I considered taking a course on data analytics geared for lawyers, but shockingly, I couldn’t find much, with the exception of a couple of classes that focused on e-discovery where predictive coding is a hot topic. One new player in the legal data space, LexPredict also offers a bunch of trainings for lawyers , but the company seemed geared towards biglaw and in any event, didn’t list dates or prices for its classes
Unable to find data engineering classes for lawyers, I decided to get at the subject from another angle: start with the data science tech and work my way back to the law. That approach gave me a plethora of options, from low cost classes at Udemy and Coursera to 12-week bootcamps costing $10k or more. However, because I didn’t have the luxury of giving up my day job, I knew that I’d need a compact course since any program that dragged out over a period of weeks or months increased the chances that I’d drop out once my caseload and client emergencies presented a conflict. Likewise, given that I’d have to take time out of my practice for a class which would cause some financial loss, I didn’t want to shell out several thousands of dollars for a class.
Based on my criteria. The Data Science Dojo ’s Data Science Bootcamp fit the bill: it’s a reasonably priced 5-day, 50-hour onsite program that didn’t have any pre-requisites (though there was about 10 hours of pre-class prep). And the class covered broad ground: in a span of the week I learned both the coding tools like basic R, MS Azure, Hadoop and Hive along with concepts like data mining and visualization, predictive modeling, Ensemble methods like bagging and boosting, random forests, the importance of cross validation, difference between training and test data, AB Testing basics, building a recommendation system and handling real time and streaming data (we hacked a quick IOT solution using Azure tools, though truth be told, I was pretty much lost by then). Below are some of my takeaways on big data – especially as it relates to the legal profession and what it’s like for a lawyer to learn a new skill at an advanced age.
Lesson 1: The mechanics of building a predictive model aren’t particularly difficult; understanding what features to include and how to approach the problem is – and that’s where domain knowledge is important . One of the underlying themes of the class is that also data science (itself a buzzword) is merely a collection of skills; intuition and domain knowledge matters as much as coding a predictive model. Yet oddly, when data science is discussed in the legal profession, we downplay the importance of legal expertise and its value in creating effective models.
Lesson 2: Predictive models are iterative and constant questioning is a good thing. Although most lawyers will argue a legal principle ad nauseous, when it comes to data, we’re surprisingly passive. For the past two years, Clio has released a Trends Report that produced interesting, albeit counter-intuitive results . Yet the results are reported as is, with no questions as to the methodologies used, what the data means or how it was gathered. That’s not true data science, it’s group think.
Lesson 3: Big Legal Big Data Isn’t All That Big Our instructor shared with us the Five V’s — Volume, Velocity, Variety, Veracity and Value – which are used to evaluate whether data rises to the level of big data. For volume, we’re talking about huge amounts of data – not terabytes, but exabytes and beyond – too large to be stored and processed on traditional machines. For example, on Facebook, 10 billion messages are exchanged each day. It’s hard to imagine many sources of legal data that approach that volume. Our instructor’s point was that we shouldn’t make a data problem into a big data problem unless absolutely necessary. So I wonder whether lawyers are using the term “big data” for small data or treating ordinary data problems as big data problems.
Lesson 4: Kaggle Competitions are Way Cool I hadn’t know much about Kaggle before my class. Although our involvement in Kaggle was limited to an in class competition over who could build the most accurate model to predict survival on the Titanic, more broadly, Kaggle serves as a platform where companies can crowdsource creation of data models. Many of the contests attract large numbers of participants – most likely because the sponsors pony up substantial cash prizes as incentive. Lawyers are often criticized for not crowdsourcing orbsharing information like other professions — but I’ve not seen a single platform that offers any financial reward to lawyers for creating content that might be used as the equivalent of case notes. If any of the companies adding blog content to supplement caselaw – as Fastcase in collaboration with Lexblog are doing now – offered a thousand dollar award every week for best content, I think we’d see any explosion of high quality crowd sourced materials
Lesson 5: All Practicing Lawyers, Not Just Millennials, Need to Understand New Technology Most of the conversation about the importance of learning about big data or AI or other new tools comes in the context of advice as to what millennials need to learn . But I think it’s even more important for us mid-career and older lawyers to to keep pace with the future if we want to have control over how the last decade or two of our careers play out.
After 50 hours of bootcamp, I’ve had to catch up on client work – and I’m not sure how soon it will be before I can apply all the fancy new tricks and knowledge that I’ve learned. For now, I’m satisfied that at least, I’ve taken the first step. when will you do the same?
I agree that all lawyers need to understand data and tech now. Small firm and solo lawyers benefit from ‘small d’ data, where they mine their own site date for areas of improvement or confirmation. Learning SEO is s good start
Great post, for all the hubhub, more well-researched posts like this resulting from actual research and learning versus speculation an guessing are really relevant.