What’s In a Name?

I wrote on this topic strategy once more in 2016, nonetheless when a present reader indicated that the distinctive article didn’t have pictures anymore (it happens), it appeared like a good different to place in writing about it as soon as extra.

I’m Kurt Cagle, or, consistent with my starting certificates, Kurt Alan Cagle. My determine is Kurt Cagle.

Now, think about that for a bit. The to be verb is remarkably slippery, and it is slippery in just about every language on the planet that has such a assemble. For event, ponder the following statements:

I’m Kurt Cagle.

I’m a creator.

These are two of basically probably the most elementary assertions in language. The first assertion might be broken down as:

There exists a label associated to the referenced entity that (a minimum of domestically) identifies that entity to tell apart it from completely different entities.

The second assertion might be restated as:

There exists a set associated to the referenced entity that signifies membership of that entity inside that set, which in flip has a label.

Makes sense, correct? Welcome to the world of ontology!

The type of names superior over time. The concept goes strategy back- the proto-indo-european phrase for determine (which had its origins throughout the Crescent Valley) was nomen (nuh-min) , and open air of that family tree, the Chinese root for determine is ming, which many linguists would acknowledge as a cognate kind of nomen which signifies that people have been using names for a minimum of six thousand years, and possibly for a lot longer.

The first names have been most likely given names, and have been in essence “gifted” names: the determine bestowed by others (typically the guardian) to recommend a given aspiration – equal to Grace, Hope, Luke (shining) – or a beseechment or dedication to a deity, equal to Mark (Mars-like or martial) or Michael (current of God) or Gabriel (energy of God), with the suffix “el” throughout the latter two circumstances which suggests Lord or Ruler (from the Sumerian and Phoenician Ba’el, mirrored throughout the determine Al-lah in Arabic and Muslim cultures).

Women’s names have been typically diminutives of males’s names, the place a diminutive was a shortened or “softened” kind of a man’s determine that normally stemmed from the roots for small, equal to Gabriella or Marcia, softened sorts of Gabriel and Mark respectively. They have been moreover given names that mirrored magnificence, equal to plant names (e.g., Holly, Ivy, or Lily), or gem names (Ruby, Pearl). Occasionally male names in a number of languages than the naming language would turn into feminized variants, such as a result of the French Jean (John, in English) becoming the feminine kind of John in England. In frequent, there are a lot of additional variants of female names than male one.

Within family groups, this differentiation was sufficient to ensure uniqueness most of the time, though in small groups you might want adjectives that qualify these names – Big John, Tall John, Red John, and so forth. In some circumstances, significantly amongst rulers, these qualifiers turned parts of their determine – Charlemagne was, for instance, Charles the Great. The phrase nickname, by the best way during which, has nothing to do with the devil (Old Nick) nonetheless as a substitute started out as ekename in Old English, the place eke meant “moreover” or “varied”. As eke fell out of utilization in comparison with moreover in OE, eke turned nekename, with the middle syllable lastly misplaced to turn into nickname. Alternative names, synonyms, or aliases, are normally weaker as a results of they usually have weaker authority (a lesson that ontologists should pay significantly shut consideration to).

Once cultures reached a certain measurement, given names have been not sufficient to fully differentiate members of that inhabitants. One reply to this, seen significantly in northern cultures, was to utilize familial relationships: John, Son of James (John Jameson), was fully completely different from John, Son of John (John Johnson). Admittedly, this made further sense in villages the place people knew one another’s households reasonably properly. nonetheless it moreover accounts on condition that Johnson is among the many commonest surnames in areas with sturdy Nordic roots. In completely different places (significantly in England and Germany) profession names have been used to tell apart family strains – Smith, Sawyer (a one who used saws to cut down bushes, or a lumberjack), Miller, Tinker (a tin smith), Carpenter, and so forth normally uniquely acknowledged a specific particular person in that profession, and as family trades have been ceaselessly handed down, so too have been the differentiating surnames.

Finally, family names moreover tended to echo excellent place choices – Lake, Brook, Craig (a mountain), Fields, and so forth. – associated to the family (this was very true of nobles). This was very true of nobles and completely different officals, who normally took the determine of a given property or metropolis that they’d dominion over, though utilizing originating cities or areas as qualifiers moreover goes strategy once more.

The use of every a given determine and a family or surname just about invariably was tied into tax assortment. For event, after the invasion of England by Willelm of Normandy (a.okay.a,. William the Conqueror) in 1066, certainly one of many first orders of enterprise was to determine the wealthy people and belongings throughout the nation, in a survey often known as the Domesday Book. These tax information served to freeze what had been until that time colloquial names (equal to utilizing expert names equal to Smith or Miller as differentiators), whereas moreover formalizing “House Names” such as a result of the Houses of York or Lancaster (lampshaded in George R.R. Martin’s Game of Thrones assortment as House Stark and House Lanister respectively).

It’s worth noting that taxonomists and ontologies talk about with the given + family or sur-names as licensed names; the surname qualifies the given (or native) determine. In a further formal code standpoint, the licensed determine acts as a namespace for the phrases (names) inside that space, and the qualifier typically denotes set or class membership. Such a system dramatically reduces the chance that a determine would possibly talk about with a couple of specific particular person. As such, it is a mechanism for determining uniqueness in a broader set.

Note that previous the emergence of given and surnames, there are completely different qualifiers which will differentiate a determine, equal to patronymics (senior, junior, the third, elder, youthful, and so forth.), and honorifics that mockingly moreover qualify a specific particular person by profession or distinction (sir, which is a contraction of Senior, doctor, reverend, and so forth.) along with gender identifiers, as a lot as and along with the newest fashion of specifying pronouns for deal with features.

Western European varieties moreover mirror a cultural selection for putting the given determine first in narrative prose, though in approved contracts and completely different communications, the reverse order of surname and given determine, separated by a comma is ceaselessly used to facilitate sorting by family determine. Asian nations,, then once more (with notable exceptions along with Thailand and the Philippines), always use the qualifying (sur) determine first. As such, it is typical to retailer a frequent utilization determine throughout the Western-style whereas moreover storing given names and surnames individually with a view to facilitate sorting using each convention.

Cardinality and Reification

It is dangerous to think about that there is always a one-to-one correspondence between a particular person and a determine. Indeed, for 50 p.c of the inhabitants, it is most likely that their determine will change a minimum of as quickly as of their lifetime. That section, in any case, is women. Until comparatively not too way back (the Nineteen Sixties throughout the United States) if a woman married, she was anticipated to take the surname of her husband. The feminist movement started altering that, partially as a reflection of shifting expectations about property possession, taxation, and a weakening of the ecclesiastical view of marriage and divorce. While nonetheless a fairly low share, women in extra marriages than ever are choosing to keep up their “maiden names” within the occasion that they marry, or every companions (significantly in same-sex relationships) are choosing to create hyphenated surnames that differ from their pre-marriage surnames.

Nonetheless, in modeling folks, the concept should be that surnames significantly will change over time, and given names would possibly very properly change too. Once as soon as extra, gender performs a operate. An particular person would possibly very properly each bodily change their intercourse by means of surgical process or would possibly a minimum of publicly present themselves as the opposite gender, with names reflecting this event.

It’s worth noting that there are always political dimensions almost about information modeling, and nowhere is that as intense as with identification modeling. Any modeling contains guaranteeing assumptions, assumptions that are normally educated by cultural norms and expectations. We in the intervening time are moving into an interval the place identification is fluid: it changes over time based totally upon gender intent, relational standing, expert appelation (The Artist Formerly Known as Prince) and even social context. For event, you is likely to be increasingly seeing gender pronoun preferences (he,him,his;she,her,her;ze,zir,zis) in social media.

Yet on the an identical time this supplies to the complexity of the model. From a semantics perspective, this recreates a building that occurs every time you’ve got temporal evolution, what I’d title the now-then pattern.

The now a a part of the pattern is an assertion that, on the time the assertion is made, is true:

Her determine is Jane Doe

The then a a part of the pattern, then once more, is a set of assertions that specify a differ (most likely open-ended) determining an event or state:

This is an event.
This event refers to a property often known as determine.
The price of this property is Jane Doe.
This event began on March 16, 1993.
This event ended on June 17, 2021.
This event was reported by Kurt Cagle.

This second building is assumed in semantic circles for instance of reification, which signifies that the second set of assumptions describes a single relationship. The this on this case is the reality is the assertion Her determine is Jane Doe. For these familiar with SQL, reification typically describes third Normal Forms (or 3NF).

In further abstract phrases, the preliminary assertion might be broken down as:

r = {s->[p]->o}

the place q is a reference to a matter entity, p is a reference to a relationship or property, and o is a reference to an object or price relative to that relationship. The reification is then a set of various relationships that debate with the given assertion or assertion q:

r is a reification.
r has property p.
r has matter s.
r has object o.
r begins at time t1
r optionally ends at time t2.
r was reported by m.

The reification is vital as a results of it specifies the time to remain of a given relationship between two points. Reifications could keep completely different metadata (for instance, specifying a pronoun type indicating hottest gender designation). However, it is usually worth noting which you could possibly have a good deal of knowledge inside a reification, nonetheless that moreover supplies significantly to the number of assertions (triples) sure to that reification.

In phrases of a graph, a reification is the reality is the metadata associated to the main points about an edge, when given two objects. For event, if s is an airport, o will also be an airport, and p is a signal that a route exists between s and o, then r:{s->[p] ->o} is the reality is the route between ?s and ?o:

airport:_SEA airport:hasRoute airport:_DEN (Seattle has a path to Denver).

The route is in influence a reification (significantly as routes, which are largely ephemeral and abstract entities, change far more shortly than airports do).

The route can assign a indicate journey time as a property on the reification. This is, efficiently, contextual information, information that belongs to not each airport nonetheless moderately to the connection that exists between the two.

With regard to names, this introduces some attention-grabbing modeling factors. A personal determine goes from being a straightforward label to being one factor with a building, a presence, and a operate or type. More on that in a bit, nonetheless sooner than digging into the weeds, its time to stress a vital stage proper right here:

Reifications are just about invariably trade-offs between the need to care for transients and the complexity of combinatorics. In the case of names, for instance, a given specific particular person may need a variety of names, though some may be starting names, some nicknames, some expert names, and a few ensuing from change in marital standing or presentation standing. An particular person would possibly even have a variety of names concurrently. Names are, in any case, not basically distinctive, nonetheless they nonetheless operate one of many essential usually used identifiers for folk, and due to this as a lot as each different, this type of reification is smart.

Modeling Names (and a Sneak Peak of Templeton)

Given all of this, what would the easiest model for names seem like? The now-then pattern suggests a two pronged technique: first, model what a Personal Name ought to seem like, then, from the set of all such names for the particular person, choose the primary determine for that exact particular person from the set, the determine that is presently used to most interesting signify that exact particular person.

The following occasion is in what I’m calling Templeton (temporary for RDF Template Notation).

?PersonalName a Class:_PersonalName;

      PersonalName:hasType ?PersonalNameSort;

      PersonalName:hasFullName ?fullName;

      PersonalName:hasGivenName ?givenName; #+

      PersonalName:hasSecondaryName; #*

      PersonalName:hasSignatoryName; #? ## Name used on a approved doc

      PersonalName:hasFamilyName ?familyName; #*

      PersonalName:hasFamilySortName ?formName; #? ## For useful sorting

      PersonalName:hasHonorific ?honorofic; #* ## Mr., Ms., Dr., and so forth.

      PersonalName:hasPatronymic ?patronyic; #* ## Sr, Jr, III

      PersonalName:hasDistinction ?distinction; #* ## PhD, JD

      PersonalName:hasNominativePronoun ?nominativePronoun; #? ## he, she, ze

      PersonalName:hasPosessivePronoun ?possessivePronoun; #? ## his,hers,zes

      PersonalName:hasObjectivePronoun ?objectivePronoun; #? ## him,her,zem

      PersonalName:hasStartDate ?startDate; #? xsd:date

      PersonalName:hasEndDate ?endDate; #? xsd:date

      PersonalName:hasLanguage; #? ## signifies the language code of the determine (en,de,es,cn, and so forth.)

      .

 

PersonalName:hasFullName a Class:_Property;

      rdfs:subPropertyOf rdfs:label

      .

 

?Person a Class:_Person;

      Person:hasPrimaryPersonalName ?PersonalName;

      Person:hasPersonalName ?PersonalName; #+

      Person:hasPrimaryNameString ?fullName;

      .

 

?PersonalNameSort a Class:_PersonalNameSort.

 

%[

PersonalNameType:_BirthName,

PersonalNameType:_AdoptedName,

PersonalNameType:_LegalChangedName,

PersonalNameType:_ProfessionalName,

PersonalNameType:_MarriedName,

PersonalNameType:_LegalAlias,

PersonalNameType:_IllegalAlias,

PersonalNameType:_NickName,

]% a Class:_PersonalNameSort.

First a few phrases in regards to the notation. The core of it (merely as with SPARQL) is Turtle as a strategy of describing assertions (triples proper right here). Variable names (beginning with a question mark) current a label, and in some circumstances (equal to ?fullName) a price utilized in a variety of assertion templates. If a line is indented (and the earlier line ends with a semicolon) then the un-indented first time interval stays in strain. For event,

?PersonalName a Class:_PersonalName;

      PersonalName:hasType ?PersonalNameSort;

      .

is temporary for

 

?PersonalName a Class:_PersonalName;

?PersonalName PersonalName:hasType ?PersonalNameSort;

 

The hash mark (#) is a comment, nonetheless throughout the template it’s used to signal cardinality. Thus #* signifies that the sooner assertion may be repeated zero or further situations, #+ signifies a one-or-more repetition, and #? signifies an non-obligatory assertion. If a variable begins with an uppercase letter, it signifies an IRI (or reference pointer), if it signifies a lowercase letter, though, then the value is an atomic price, defaulting to a string. Thus,

?PersonalName personalName:hasStartDate ?startDate; #? xsd:date

signifies that ?startDate on this particular case is a date.

The notation

%[a,b,c,…]% a class:PersonalNameSort.

signifies that the file of issues are each subjects to the associated predicate and object, and could also be very useful for specifying type enumerations. Finally the one a is a shorthand for rdf:type.

Note: Templeton is a shorthand templating notation I’ve been creating as a strategy of creating schemas which may be expanded to OWL, SHACL, XML Schema, or JSON-Schema. I’m engaged on a parser for it now.

Of Compositions, Associations, and the Now/Then Pattern.

The modeling of PersonalName ought to seem straightforward, with a few caveats. First, it has been my comment working with dozens of ontologies over time that virtually every time you define a class, there’s usually some kind of intent indicator needed. Such indicators do not materially change the definition of the class, nonetheless they do current a stage of context about what a particular event is supposed to do. For event, PersonalNameSort identifies whether or not or not one factor is a starting determine, a married determine, an alias, or a expert determine (amongst others) These are differentiated from being subclasses as a results of they do not change each different properties.

The second caveat has to do with modeling. UML differentiates between a composition and an affiliation. An affiliation typically describes a relationship between two disparate entities, and throughout the semantic parlance might presumably be considered the an identical as a reification (or third common kind improvement in SQL) . A composition, then once more, occurs when there’s an existential dependency between the subject and object. For event, even when you might have two people who’ve the an identical personal determine, these two conditions are distinctive (having fully completely different start and end dates, for instance). Should a specific particular person be deleted from the database, your complete names associated to that exact particular person would moreover must be deleted (which is not true for associations).

In my very personal modeling, compositions should always belong to the reference matter, or, put one different strategy, the connection elements from the subject to the factor semantically. Associations, then once more, usually are reifications – there’s a reifying object such as a result of the route in our airport occasion, that binds two entities collectively. If you delete the reification (the route, proper right here), you don’t on this case delete the associated entities (the airports),

There are some objects that seem to skirt the boundaries. An deal with is a good occasion. If a specific particular person has an associated deal with, a naive modeling would make an deal with a composition. However, it is not. Multiple people can keep on the an identical deal with. If one specific particular person strikes away, that does not set off the deal with itself to “disappear”. This moreover implies that the affiliation of a specific particular person with an deal with should be seen as being a reification. I exploit the time interval Habitation because the class for that reification, one which elements to every a specific particular person and an deal with:

?Habitation a Class:_Habitation;

     Habitation:hasType ?HabitationSort;

     Habitation:hasTenant ?Person;

     Habitation:hasAddress ?Address;

     Habitation:hasStartDate ?startDate;

     Habitation:hasEndDate ?endDate; #?

     .

Regardless of whether or not or not one factor is a composition or an affiliation, there are events the place you merely want to know what a specific particular person’s current main determine is, with out having to assemble superior queries to hunt out it. This is the place inferred triples come into play. An inferred triple is normally generated, each by means of a SPARQL Update query or as a a part of a CONSTRUCT (these are type of the an identical, relying upon how inferred triples are continued).

For event, the following SPARQL Query will change the primary determine for a specific particular person to the required price:

# Update Primary Name

delete {

    ?Person Person:hasPrimaryName ?outdatedPrimaryName;

            Person:hasPrimaryNameString ?oldFullName.

    }

insert {

    ?Person Person:hasPrimaryName ?newPrimaryName;

            Person:hasPrimaryNameString ?newFullName.

    }

the place {

    values (?Person ?newPrimaryName) {(Person:_JaneDoe PersonName:_JaneDoeBeginName)}

    ?Person Person:hasPrimaryName ?outdatedPrimaryName

    ?Person Person:hasPrimaryNameString ?oldFullName.

    ?newPrimaryName PersonName:hasFullName ?newFullName.

    }

   

   

Inferred triples are ceaselessly transitory assertions – they mirror the default price from a set of objects, nonetheless which will change, and ceaselessly they provide a strategy of shortcircuiting superior queries. For event Person:hasPrimaryNameString is the string illustration of the default personal determine, This might be made far more extremely efficient by making that particular person property the subproperty of 1 factor like skos:prefLabel (assuming a major inference engine), so that a naive query, equal to:

select ?s ?determine the place {

    ?s skos:prefLabel ?determine.

    filter (accommodates(?determine,’Jane Doe’))

}

will return a file of all entities which have a main label of “Jane Doe” in them. Note that this isn’t a terribly surroundings pleasant query, nonetheless it might be helpful, nonetheless.

So whilst you’re fascinated in regards to the design of your fashions, set up these properties that you just’d intuitively want to see for the programs in question which may be inferred or derived, and in influence pre-generate or change these properties as a result of the state of the factor changes so that your clients don’t ought to assemble superior queries. Remember, a triple retailer is an index, and such actions might be thought-about optimizing that index.

Summary

Modeling, when it comes correct all the best way all the way down to it, is the strategy of questioning your assumptions and optimizations. An monumental problem that arises with most typical SQL packages is that many database modelers optimize for complexity by reducing the number of database tables and joins, nonetheless this moreover reduces the contextual metadata that is increasingly a requirement in within the current day’s information rich world.