How popular are "And Daughters" businesses?
It's quite popular to see high street shops names "Somesuch and Sons". Indeed, my grandparents ran "Eden & Sons" for many year.
Much rarer is seeing "... & daughters".
But, of course, the plural of anecdote is not data!
The UK register of businesses - Companies House - has a pretty good search engine.
Doing a search for AND SON returns 220,000 results. We use the singular because that should also match the plural.
Instinctively, how many "AND DAUGHTER" businesses do you think they are? Fewer? By how much?
A search for AND DAUGHTER returns 206,000 results!
At first glance, they look similar. Yay gender equality! But there's a problem. Both searches also return dissolved companies.
Additionally, "AND SON" also matches "ANDSON" - which distorts the results. How can we get all the live companies in the system? The search doesn't offer any filters.
Companies House offers a nifty search API. But, sometimes, for tasks like this, we just want a honking big CSV. It's 2.2 GB. You're not opening that in Excel. It's PANDAs TIME!
This PDF contains details of all the currently trading businesses in the UK. Let's open it up!
Python 3# Import the library
import pandas as pd
# Read only the first column into a dataframe
df = pd.read_csv("BasicCompanyDataAsOneFile-2020-07-01.csv", usecols=["CompanyName"])
What are we looking for? We can't just search for "SONS" and that will bring back things like "HENDERSONS". Some businesses use "AND SONS" others use "& SONS". Some have "&SONS". We also have to account for the singular.
This quick-and-dirty regex will attempt to find any of the above, without also getting "AND SONGS", for example. Do tell me if there's a better way.
(?:AND |\&)\s?SON([S?]|[\s])
In Pandas terms, that's:
Python 3df[df['CompanyName'].str.contains('(?:AND |\&)\s?SON([S?]|[\s])')]
Which prints out:
TEXT CompanyName
258 & SON STUDIO LIMITED
307 &SONS TRADING COMPANY LIMITED
467 (BOWEN AND SONS) BAS MECHANICAL SERVICES LIMITED
19492 1ST CLASS REMOVALS & SONS LTD
28096 24LEX & SON LTD
... ...
4667235 ZOLEE & SON LTD
4668636 ZORAN&SONS LTD
4668689 ZORBAS & SONS LIMITED
4671244 ZYBERI & SONS CAPITAL INVESTMENTS LIMITED
4671350 ZYGMUNT CURRY & SONS LIMITED
[17950 rows x 1 columns]
Or, you can run len()
on the output to get the count.
Running the same for DAUGHTER
returns:
TEXT CompanyName
246 & DAUGHTER LIMITED
86594 A STEWART AND DAUGHTERS LIMITED
98526 A.R & DAUGHTERS LIMITED
100064 A.W.F FLETCHER AND DAUGHTERS LLP
179094 AFI AND DAUGHTERS LTD
... ...
4568242 WILSON SON & DAUGHTERS LIMITED
4582108 WIZDOM: BY OSAGIE & DAUGHTERS LIMITED
4583025 WK LUMSDEN AND DAUGHTER LIMITED
4649690 Z.J. KUBANEK & DAUGHTERS LTD
4669405 ZS & DAUGHTERS LTD
[320 rows x 1 columns]
Oh. There are about 56x as many "AND SON" businesses as there are "AND DAUGHTER" businesses. Of course, these data don't tell us anything about the size of the businesses or how successful they are. It doesn't tell us how many companies are named "sons and daughters". And a dozen other little data issues.
But, I think the trend is clear. Over time, approximately the same number of "& SONS" businesses and "& DAUGHTER" businesses have been registered. But far more DAUGHTERs have been dissolved.
Why is that?
Duggie says:
Looking at the broader set which includes the dissolved businesses, it might provide better understanding to note when the business was trading (what decade or era) and which industry they are part of. That may provide insight into the reason for having fewer now.