How to use ROE and EV/EBITDA in Python to Discover Bargain Stocks

Implementing Eddy Elfenbein’s Two Articles on Crossing Wall Street Website. In these articles Eddy show why return-on-equity (ROE) is so important and why EV/EBITDA is the single most important metric.

I used selenium and chromedriver to scrape data from yahoo finance to implement these concepts shared in the articles. To try to run this, you just need to add one or more stock tickers in the fuction.

Read More

Automated File Downloader With Python, Chromedriver and Selenium

Downloading a single file from a website is simply easy and does not require a tutorial whatsoever. Where it becomes a daunting task is when you need to download multiple files of say hundreds or thousands. For most employees, a big percentage of their daily tasks involve downloading files and then the files are stitched together for a report or analysis.

In this short tutorial, I will show how you can download one year stock price data from yahoo finance for multiple companies using their ticker symbols. Python, chromedriver and selenium package were used to automate this process. You could use the same code to download any other file type.

Read More

PDF OCR and Named Entity Recognition: Whistleblower Complaint - President Trump and President Zelensky


In this short post we are going to retrieve all the entities in the “whistleblower complaint regarding President Trump’s communications with Ukrainian President Volodymyr Zelensky” that was unclassified and made public today.

I apply the techniques in my two previous blog posts, that is PDF OCR and named entity recognition. Instead of reading through the 16 pages to extract the names, dates, and organizations mentioned in the complaint, we will use natural language processing as a tool to automate this task .

Read More

Training a domain specific Word2Vec word embedding model with Gensim, improve your text search and classification results

In this post, I will show how to train your own domain specific Word2Vec model using your own data. There are powerful, off the shelf embedding models built by the likes of Google (Word2Vec), Facebook (FastText) and Stanford (Glove) because they have the resources to do it and as a result of years research. These models that were trained on huge corpus generally perform well but there are times they fail on specific tasks in industries like health, finance and legal. There are two approaches to solve this problem. First, train your own embeddings if you have enough data of over a million text documents and the compute power. Two, fine-tune one of the listed models above with your data, especially, when your data is small (I will post a follow up blog to show how to fine-tune word2vec models).

Word2Vec assumes two words that have the same context will also share the same meaning and therefore, both words will have similar vector representation. The vector of a word is a semantic representation of how that word is used in context. Being able to represent words as dense vectors is the core of the successes registered in the application of deep learning to NLP in recent times.

Read More

Named Entity Recognition With Spacy Python Package Automated Information Extraction from Text - Natural Language Processing

In my previous post, I showed how you can get the entities in an article or text documents using natural language processing NER Package by Stanford NLP.

In this post I will share how to do this in a few lines of code in Spacy and compare the results from the two packages. Named entity recognition is using natural language processing to pull out all entities like a person, organization, money, geo location, time and date from an article or documents.

Read More