Sauvik Biswas

Comics enthusiast, Musician, Programmer and Traveller

  • About
  • Travelogue
  • On Comics
  • Now
Comics enthusiast, Musician, Programmer and Traveller

36% likely that Sherlock said, “Elementary, my dear Watson”!

Print This Post February 13, 2016 by Sauvik Biswas Leave a Comment

keep-calm-it-s-elementary-my-dear-watson

I had built a crude N-Gram parser and resorted to the Sherlock Holmes books on Project Gutenberg as my training data set (training corpus). I was toying around with various phrases and the likelihood of their appearance in the books. One such phrase was “Elementary, my dear Watson”.

“Elementary, my dear Watson” is technically a six word sentence, where even the start and the end of the sentence is taken into account. To the program, the sentence looks like this – [‘<s>’, ‘elementary,’, ‘my’, ‘dear’, ‘watson’, ‘</s>’]. I was using the Stupid Backoff model to compute my probabilities (Yes that’s what the creators, Brants, et al., calls it. Here is a link to the original paper.)

The program returned me a value of 0.36. I had used a 6-gram model. There was no Markov approximation involved in analysing this 6-word sentence, which would just assume that in an n-gram only the nearest n-1 would matter. (In case of a 4-gram model, the program would give the same output even if it was ‘elegant’, ‘extreme’, ‘au revoir’ or ‘exactly’ instead of the word ‘elementary’).

A quick search through the n-grams showed that the phrase, “elementary, my dear watson”, was not registered (the program reduces all inputs to lowercase). The closest phrase was “exactly, my dear watson”. In fact the probability of that was shown to be 100%. That is hardly surprising, as it was in the training corpus.

Here is the truth. Sherlock Holmes never said the phrase, “Elementary, my dear Watson.” I had no idea regarding the source of this misattributed quote. Hence, I searched online and stumbled upon a reliable answer.

It was P.G. Wodehouse who wrote that!

TIL: Few things about cycling and nutrition
TIL: Power Distance Index
Posted in: Coding Tagged: AI, N-Gram, Sherlock Holmes

Search the Site

Subscribe to my blog

Or use these links for your reader: RSS / Atom

Recent Posts

  • A tryst with B+Trees: Part I March 14, 2024
  • Tintin chases a plot for the first time in The Broken Ear March 5, 2024
  • A naive implementation of file-based storage February 26, 2024
  • YetiDB: an academic exercise February 22, 2024
  • That one time we actually trekked to Goecha-La July 9, 2023
  • Tour de Self: From Udupi to Bangalore January 3, 2023
  • Twenty Twenty-One February 23, 2022
  • Day 16: Back to Guwahati December 20, 2020
  • Day 14-15: Bomdila December 19, 2020
  • Day 12-13: Villages around Dirang December 17, 2020
  • Day 11: Dirang Monastery and Mandala Top December 15, 2020
  • Day 10: Through Sela Pass to Dirang December 14, 2020

Tags

Anime Artwork Bande Dessinée Bangalore Batman Berlin Casterman cycling Dark Project Dehradun Delhi Dharamshala Europe Trip '19 Food Graphic novel Guwahati Hergé Himachal trip '15 Himachal trip '19 Hybrid mod '17 Juda ka Talab Kasol Kerala trip '15 Kodaikanal-Ooty Trip '16 Manali Mandi Manga Munich Music NaNoWriMo North-East trip '14 North-East trip '20 Ooty Poetry Prague python Reckong Peo Rishikesh Tabo Tawang Tintin Tour of Nilgiris '16 Trekking Uttarakhand trip '17 Vietnam trip '15

Copyright © 2025 Sauvik Biswas.

Lifestyle Hack WordPress Theme by Sauvik Biswas modding themehit.com