GenProfile
Name to Age Profiler
"What's in a name?" A lot, it turns out...
For organizations looking for a baseline understanding of their customers or constituents, Genprofile.app provides a comparative profile of the age makeup of their constituent base compared to a user-defined baseline population. Based on a fusion of data from the Social Security Administration for over 100,000 first-names combined with the latest age estimates from the Census Bureau by one-year age group, our profiling engine offers an ideal way for a small retail business or membership organization to benchmark representation by age segment within their local market as a starting point for market analysis and planning.
(please note that this service is designed for US based persons only at this time)
This useful research tool and it's companion API will go to general availability on Wednesday, October 1st; until then, we are offering a free profile to organizations on our launch list. Just click the 'Get a Profile' button below and we will get started; or click 'Get More Info', and we will send you more information on the model and how it was constructed.
Great, let's get started....
This application is brought to you by 35-year analytics veteran and TidyAnalytics founder Joel Narducci. TidyAnalytics goal as a company is to develop apps and datasets that distill complex data for organizations of all stripes in an affordable and immediately useful way. Nothing is more fundamental to the human situation than age and lifestage.
Some details first about data security and privacy:
Any data you send is processed and stored within a private data enclave within the Microsoft Azure cloud, including private key-vault and storage accounts, with your data segregated from other client data at all times.
By default, we retain the data you provide for a period of 30 days, unless you expressly direct us in writing otherwise. We retain the profile we generate for you indefinitely, for your convenience, unless you direct us otherwise.
There are a couple of requirements for the format of the data that we will summarize on the next page. Please note that more detailed information will be provided in our upload instructions via email.Genprofile.app is a profiling engine, and not a process designed to overlay age data onto your existing customer records. It is in no way a threat to the privacy of your customers, who are only analyzed in the aggregate.
Similarly, TidyAnalytics is not an information-brokerage and does not retain, broker, or sell individual level consumer information, and never will.
Because of this, the only data field strictly necessary to send is "first name"; however, with that being said, we recommend including the appropriate internal record identifier in your upload (customer ID, account #, etc) , as it will likely be useful on your end to know who was profiled and who was not; not all persons will have a name that matches our database. We flag and report back those names that did not profile.
It may only be practical (or allowable) within your organization to provide name data in a summary format, without customer identifiers; if your data does NOT represent one record per unique person, you should include a count of records for that name, and, ideally, an additional field showing a volumetric measure for that name (total visits count, total revenue last 3 months, etc.). The goal here is to generate a statistically representative profile of the segments that are actually driving your business or organization, rather than simply assigning equal weight to whoever happens to be present in your database regardless of contribution.
If provided, we will use this extra valuation field as weighting factor for the aggregate profile and will generate two profiles for you: one based on simple count and a second based on the weighted percentages, the latter which will show you the relative contribution of each age cohort proportionately. You will get a single profile based simple name count if you do not provide this additional valuation measure.
Person-Level - Weighted (Ideal):
- customer id
- name
- value total
(visits, revenue, score, etc.)
Person, Anonymous, Weighted:
- name
- value total
(visits, revenue, score, etc.)
Person, Anonymous:
- name
(unique person)
Aggregate Formats - acceptable:
- name
- total unique persons
Aggregate Formats - ideal:
- name
- total unique persons
- value total
(visits, revenue, score, etc.)
What you get:We'll send you a time limited private link where you can see the your profile online, as well download it in PDF form along with the underlying data profile and comparison population measuresBecause of the nature of this no-fee offer, please note that each profile and sponsoring organization is limited to 50,000 input records
The default comparison base for the age profile is the US population by 1 year age group; however, on the following form you can customize the geographic scope of your comparison baseline; if you have a sound understanding of the geographic scope of your audience, this is a really powerful feature that you will want to take advantage of. We support selection down to the county level currently, with support for smaller geographic selections coming soon (eg. 15 minute trade area)
Finally, please feel free to add any questions, or additional information we should know about your use case in the 'Request' area of the next form.
Name to Age Profiler FAQ
Question: | Answer: |
---|---|
What is the source of the data? | The primary source for age baselines is the US Census Bureau, which provides in some of its tabulations and population estimates count of total persons by single year of age. Names data comes from the Social Security Administration, which publishes name counts of persons born by year, for all years since the agency's inception. The SSA also generates actuarial data in the form of cohort life tables for the US population, which TidyAnalytics then uses to estimate the proportion of persons born each year surviving to 2024; this last step is crucial for determining the correct current representation expectations for each assigned name from a given year. |
Are there any issues to be aware of? | You should definitely keep a couple of things in mind when using these profiles: The SSA data is limited to US born persons whose state of birth is known. In addition, the names supplied for any given year are limited to those used a minimum of five times in the given year, with names under this threshold suppressed due to SSA privacy protocols. The upshot: the initial iteration of the model may not cover a significant share of customers or members coming from diverse, non-US origin populations, who may have uncommon names falling under the coverage threshold, or, alternately, an age distribution that varies for a given covered name from that of the profile for US born persons. This is a difficult analytical problem, but we are working diligently on the next generation of the age model, and our goal is to assign all names, including those not found in the SSA data, a credible expected age distribution, based on other characteristics that are known or can be reasonably inferred. TidyAnalytics has a number of other global name datasets in-house that we are reviewing for potential use. We hope to release a refined version of the model sometime in 2026. |
I see that the SSA reports its name data by gender, but I notice that you do not ask for or support inclusion of gender in your matching process. Why? | Setting aside the somewhat contentious nature of the topic these days, the answer to this question is actually a practical one: many, if not most, users of the profiling process -- businesses and organizations -- will not have gender recorded in their files, and thus inclusion of gender as a match requirement or supported dimension would reduce the general applicability and usefulness of the profiling tool. The upshot: ignoring gender means the age profiles likely suffer a bit in terms of precision (ie for names that may have a differential pattern of use between male and female through the years), however the process benefits greatly from the simplicity of using a model that collapses the gender counts into a single count for each name, and not having to deal with the issue of gender-ambiguity and gender-determination in a matching context or wide unavailability of the attribute in source data systems. |