Skip to main content

What is this Data Schema ?

As part of onboarding we connect your database schema so we can give the model context of what the data it is quering looks like. After looking at the schema our model selects a few specific properties it thinks will be relevant for users searching. This is to avoid storing things like PPI or unenessary admin/operational values. While doing this the model also trys to fill some holes in the schema and try to create some new feilds it thinks like be usefull for NL queries and that can be generated either by lloking at the data or generated by genAI. Based on your inputs from the Onboarding form we create a data schema. This schema is represented in json that has 3 fields:
  1. Name: Name of the feild / property
  2. Description: This is the description of the value that is stored by this property
  3. Exists: This holds the mapping of where this value is stored at as in, let’s say we are looking at value discount orice that is stored in the table prices which has keys like ‘original_price’ ‘discounted_price’, etc in that case value of this key will be prices.discounted_price.
In case of dynamoDb or other NoSQL db this will directly store the column name
This is becuase we use generate a few values for better search results as in when it comes to an ecom platform serving multiple brands a Natural Language Query can we entered “Find mw a polo T-shirt from luxury brand” or “Find mw a polo T-shirt from reputed brand”. In either of these cases the value will not be present in any db unless we do extreme categorization so while combing through the data we use the model to look up brands and figure out if it is a luxury brand or in the second case while combing through we can average the user ratings and tag the brands as reputed or otherwise

Example Search schema for a Real Estate brand

Note the schema has been trimmed, but you experince this schema in play at or real estate demo
Realestate.json
{
    "half_bathrooms": {
      "type": "integer",
      "description": "Number of half bathrooms",
      "exists": "properties.half_bathrooms"
    },
    "agent_name": {
      "type": "varchar",
      "description": "Agent full name",
      "exists": "agents.first_name, agents.last_name"
    },
    "pet_friendly": {
      "type": "boolean",
      "description": "Whether pets are allowed",
      "exists": "properties.pet_friendly"
    },
    "agent_phone": {
      "type": "varchar",
      "description": "Agent phone number",
      "exists": "agents.phone"
    },
	...
    "property_age_category": {
      "type": "varchar",
      "description": "Property age category (new/modern/established/vintage) - generated from year_built",
      "exists": false
    },
    "maintenance_level": {
      "type": "varchar",
      "description": "Expected maintenance level (low/medium/high) - generated from property age and type",
      "exists": false
    },
    "commuter_friendly": {
      "type": "varchar",
      "description": "Commuter friendliness (excellent/good/fair/poor) - generated from location and walkability",
      "exists": false
    },
    "size_category": {
      "type": "varchar",
      "description": "Size category (compact/medium/large/estate) - generated from total_area_sqft",
      "exists": false
    },
  }