How To Scrape Links From Reddit Comments

I recently had the need to retrieve all the links from a particular subreddit, consolidate them, and keep a running list of new links that get added. Knowing how to scrape links from Reddit comments versus a Reddit’s posting is what this article is about. While I won’t go into the detail of storing the links into a database, I want to instead focus on the meat of the python script:

How to scrape links from Reddit Comments?

Python script used to scrape links from subreddit comments.
Python script used to scrape links from subreddit comments.

Continue reading →

Catch-All Join To A Lookup Dimension

I recently ran into an interesting problem that I’d like to share and show how I resolved it. The solution involves a catch-all join to a lookup dimension table.

ERD Diagram of wildcard lookup status table.
ERD Diagram of wildcard lookup status table.

Imagine having many employees that work in many departments. Each department has their own way of determining the employee’s status; Some departments use the status code that was given in the source system, other departments rely solely on the department they’re from and others use a combination of both! Oh yeah, the fun bit, this status logic can change…

Continue reading →

Comma Separator Before Field Name Or After?

Before! The comma separator before field name is always preferred. There, that was easy.

You’ve come here to either win an argument with a coworker — in which case I hope you’re here to find proof for having the comma separator before field names, or you’re doing it wrong — or you’re here to learn. In either case, the comma comes before field names. So, allow me to justify when and, more importantly, why I use one variation over the other:

Example of both comma separators

You’ll notice that both these queries are very much identical, with the exception of the placement of the delimiting comma between each field in the select clause of course.

There are two schools of thought here:
Continue reading →