When you read policies or contracts all day, you tend to learn the value of nuance. For example the word “will” and “shall” have very specific meanings in a contract. While the word “shall” sounds less commanding than “will”, “shall” is actually a legally binding term meaning “you have a duty to do so”. According to the University of Texas, “shall is the most misused word in all of legal language.”
How much effort really goes into defining words?
A lot apparently.
Merriam-Webster, the renowned English Dictionary, updates their online index daily, and the task itself does not require scholars or academics. Merriam-Webster has made it clear that their employees simply read a “cross-section of published materials” to determine the frequency that a new word is used. After a certain threshold of uses, they add the term to their dictionary, making sure to include citations and examples of how the word is spoken or written. If you go to their online dictionary, you can look up this information on any term. Even if you open a copy of their dictionary from twenty years ago, the citations and exmples were included, especially for newer terms.
Merriam-Webster’s Dictionary API is exactly what you would expect from a 21st Century Dictionary company. Like most organizations offering free data, first you have to register for a Developer account to get an API token. Once you’ve got a token, you can query any term (like “murmurously”), even without having to write a program. All you have to do is craft a URL into your address bar:
URL STRUCTURE:
"https://www.dictionaryapi.com/api/v3/references/collegiate/json/" + any_word + "?key=" + your_api_key
EXAMPLE:
https://www.dictionaryapi.com/api/v3/references/collegiate/json/murmurously?key=/
You should receive back a response that looks something like this:
RESULT:
[{"meta":{"id":"murmurous","uuid":"6d150be3-9dbd-40a1-8a8c-8d2cacd16761","sort":"134837000","src":"collegiate","section":"alpha","stems":["murmurous","murmurously"],"offensive":false},"hwi":{"hw":"mur*mur*ous","prs":[{"mw":"\u02c8m\u0259r-m\u0259-r\u0259s","sound":{"audio":"murmur02","ref":"c","stat":"1"}},{"mw":"\u02c8m\u0259rm-r\u0259s"}]},"fl":"adjective","def":[{"sseq":[[["sense",{"dt":[["text","{bc}filled with or characterized by {a_link|murmurs} {bc}low and indistinct"]]}]]]}],"uros":[{"ure":"mur*mur*ous*ly","fl":"adverb"}],"date":"1582","shortdef":["filled with or characterized by murmurs : low and indistinct"]}]
To make sense of the data structures, I tend to parse JSON in Python3. In order to automate this process and make sense of the API data, I needed to do the following:
Here’s that short Python3 script I wrote:
#!/bin/python3
import json
import requests
word = "innovation"
api_key = "12345678-9abc-defg-hijk-lmnopqrstuvw"
#############
# Variables #
#############
def get_word(url):
# Request the URL using the API key
print("Requesting " + url)
url = url + "?key=" + api_key
#send a get request to the URL we crafted
page = requests.get(url)
# Notify us if server response gives bad status code
if page.status_code != 200:
print("Request failed")
# Convert server response to JSON
response = json.loads(page.text)
# tell the user how big the object is
print("JSON items: " + str(len(response)))
# return the response value to the calling function
return response
def pretty(input):
# convert JSON input to a pretty and human-readable format
output = json.dumps(input, indent=2)
# return the response value to the calling function
return output
########
# MAIN #
########
# do not change this unless you are working in a different dictionary
url = "https://www.dictionaryapi.com/api/v3/references/collegiate/json/" + word
# call our method above using the URL we crafted.
response = get_word(url)
# Do the following for every item in the JSON response
for i in range(0, len(response)):
# iteratively make the data look good
result = pretty(response[i])
# iteratively craft a filename
filename = "output" + str(i) + ".json"
# iteratively create a new file with our filename
outfile = open(filename,"w")
# iteratively save our JSON data to the file
outfile.write(result)
# iteratively close the file so we can open it elsewhere
outfile.close()
The resulting output for this script should be as follows:
{
"meta": {
"id": "innovation",
"uuid": "9b912158-88bc-4c78-8b64-baff5c81d396",
"sort": "091741000",
"src": "collegiate",
"section": "alpha",
"stems": [
"innovation",
"innovational",
"innovations"
],
"offensive": false
},
"hwi": {
"hw": "in*no*va*tion",
"prs": [
{
"mw": "\u02cci-n\u0259-\u02c8v\u0101-sh\u0259n",
"sound": {
"audio": "innova04",
"ref": "c",
"stat": "1"
}
}
]
},
"fl": "noun",
"def": [
{
"sseq": [
[
[
"sense",
{
"sn": "1",
"dt": [
[
"text",
"{bc}a new idea, method, or device {bc}{sx|novelty||}"
]
]
}
]
],
[
[
"sense",
{
"sn": "2",
"dt": [
[
"text",
"{bc}the introduction of something new"
]
]
}
]
]
]
}
],
"uros": [
{
"ure": "in*no*va*tion*al",
"prs": [
{
"mw": "\u02cci-n\u0259-\u02c8v\u0101-shn\u0259l",
"sound": {
"audio": "innova05",
"ref": "c",
"stat": "1"
}
},
{
"mw": "-sh\u0259-n\u1d4al"
}
],
"fl": "adjective"
}
],
"date": "15th century{ds||2||}",
"shortdef": [
"a new idea, method, or device : novelty",
"the introduction of something new"
]
}
For this project, I wrote a scripts to turn the JSON structure into a useful hierarchy chart. Along the way, I accidentally made a new Python module called json4tree. If you’d like to learn more about it, check out the source code on GitHub. Here’s the code I used to convert the JSON for my D3 visualization:
#!/bin/python3
# import the necessary modules
import json
import json4tree
# import your JSON data
infile = open("input.json", "r")
json_file = json.load(infile)
infile.close()
# create a new handler
converter = json4tree.handler(json_file)
# You can either print your results...
converter.results
# Or you can save your results
outfile = open("output.json", "w")
outfile.write(converter.results)
outfile.close()
Here’s a sample of what the resulting JSON file looks like:
{
"name": "query",
"children": [
{
"name": "meta",
"children": [
{
"name": "id",
"value": "innovation",
"type": "str"
},
{"..."}
]
}
]
}
And here’s my final product – a D3 hierarchy-based collapsible tree:
go ahead, click on the dots
I want to give a special thanks to several organizations and people that made this writeup possible.
Without your API, I would not have been able to do this analysis.
Without your visualization tools, I would not have been inspired to explore new datasets.
Without your insightful writeup, I would not have published my first Python package.