Before SQL-92 there was no possibility to use outer joins since they simply were not in the standard. To do an inner join you basically did:
SELECT * FROM t1,t2 WHERE t1.id = t2.id
So there was not even the keyword JOIN that you can use today. Therefore people used subqueries quite a lot since it was the only possibility to do things an outer join can provide.
With outer joins available today there is the recommendation to rather use joins than subqueries since the Query Optimizer of major DBMS systems can handle joins better than subqueries. If you take SQL Server for example you will realize that the query optimizer tries to rewrite any subquery as a join. Since I treated that subject in my master thesis i know that Oracle DB does the same.
So you can help the query optimizer by using joins instead of subqueries especially if your subquery gets very nested or complicated in general.
But don’t just join everything together. Imagine the following example from AdventureWorks2012. You want to find Customers who actually placed an Order.
SELECT a.* FROM Sales.Customer AS a JOIN Sales.SalesOrderHeader AS b ON a.CustomerID = b.CustomerID
The join returns 31465 rows. That seems strange since we only have 19820 Customers. So because we did an inner join we just multiplied many customers since we match every customer to every of his order. To solve this issue we can do a distinct on the customer site.
SELECT DISTINCT a.* FROM Sales.Customer AS a JOIN Sales.SalesOrderHeader AS b ON a.CustomerID = b.CustomerID
Well that seems to be expensive. Remember that normally DISTINCT always means grouping. If you are lucky SQL Server can group by using Stream Aggregate. If you are unlucky it has to use Hash Match which is more cost intensive and time consuming. So is there another straight forward possibility?
SELECT * FROM Sales.Customer AS a WHERE a.CustomerID IN ( SELECT b.CustomerID FROM Sales.SalesOrderHeader AS b )
When you think about it the following query seems to be more appropriate because we only select rows from the customer in the first place where some condition meets. We don’t want to join them.
Now the disappointment. The subquery is transformed into a join. We already assumed this but let's have a look at the execution plan:
As you can see it is actually a merge join. By the way almost the same plan is created for the join distinct statement.
Since there is no difference, let's analyze the plan
Now we see the query optimizer shine. It takes all rows from Sales.SalesOrderHeader in 1 and groups them by CustomerID in 2. So that is the execution of DISTINCT even before the actual join.
In 3 it just reads all rows from Sales.Customer ordered so the query optimizer can now apply a merge join (4). Since the rows are ordered (because of both indexes) and the row set is kind of small it is the best solution to use a merge join here.
We learned there is no difference between simple subqueries and joins but just because the query optimizer is smart enough to build a plan that takes full advantage of the ordered columns. If the optimizer would just apply a join and a distinct afterwards things would become worse.
So in this situation my advice would be to use the subquery nonetheless because it is easier to read for developers. After all the SQL Server does neither execute the join we formulated nor the subquery because he found a more efficient way.