Language Learning for the 21st Century: Interpersonal Communication Through Digital Communities
16 October 2018 — Texas Language Center: “Language Matters!” Lecture Series
Creating a Conversational Hebrew Vocabulary List: A Reproducible Use of Technology to Overcome Scarcity of Data
22 April 2018 — National Council of Less Commonly Taught Languages (NCOLCTL) 21st Annual Conference
Transitional Semi-Allophonic Spirantization in Tiberian Hebrew
16 February 2018 — Jil Jadid Graduate Student Conference in Middle Eastern Languages and Literatures
Lexical Variation in the Understanding of ברא: Homonymy or Polysemy
30 January 2015 — Students of the Ancient Near East 8th Annual Symposium
#merica: Culture and Diversity
12 February 2019 — Galilee Dreamers Project—Sparks of Change
Words, Collocations, and Technology: Teaching and Learning Vocabulary
13 January 2019 — Workshop for Masa Israel Teaching Fellows
Frequency Dictionary of Spoken Hebrew (FDOSH)
My MA thesis at the University of Texas at Austin was the creation of a frequency dictionary of spoken Hebrew. All of the scripts and data for the project can be found at the project GitHub page or as supplementary materials at Zenodo. The manuscript itself, written entirely in Markdown and LaTeX, has its own GitHub repository, but the final manuscript can be accessed directly here.
Though my thesis is now complete, this project is ongoing as I work to optimize the code, simplify the process, conduct further statistical analyses, and publish my final results. The abstract for the thesis can be found below:
Studies using word frequency dictionaries—on topics such as vocabulary acquisition, vocabulary load, extensive reading, and vocabulary testing—have historically centered around corpora and morphological issues specific to European languages, especially English. One of the reasons for this is the lack of resources that often plagues departments of less commonly taught languages. Corpora of spoken language are particularly difficult to obtain—the funding and time necessary often make such a project impossible.
This thesis is an effort to provide some of the methodology and tools necessary for educators interested in creating frequency dictionaries for research purposes, for their own classrooms, or even for wider dissemination. In doing so, it will provide an overview of some of the key decisions that must be taken into account for such a project.
Throughout this thesis, the creation process behind the Frequency Dictionary of Spoken Hebrew (FDOSH)—a list of the most common words in conversational Modern Hebrew—will be explained. The tools used to create the FDOSH, including corpus resources and customized scripts, are provided as part of a repository of supplementary materials. The goal is to make the entire dictionary-creation process as reproducible as possible while allowing for flexibility and transparency in the tools used. It does this by using well-documented open-source scripts written in an easily readable programming language, Python.
Beyond providing these tools, the present project explores the theory and many of the considerations that play an important role in the creation of a frequency dictionary. These include issues such as corpus size, corpus text type, whether the list is intended for general or specialized use, word family levels, and objective criteria. Issues regarding Hebrew’s synthetic morphology and ambiguous non-vocalized writing system are also addressed.
The project aims to serve as a catalyst for future research that may build upon the ideas discussed here. The development and open dissemination of tools such as these can only lead to greater cooperation among educators and researchers, to the benefit of all involved.