{"id":26103,"date":"2020-12-02T13:32:32","date_gmt":"2020-12-02T13:32:32","guid":{"rendered":"https:\/\/merikebi.warrenmyers.com\/?p=26103"},"modified":"2020-12-02T13:32:32","modified_gmt":"2020-12-02T13:32:32","slug":"answer-by-warren-for-splunk-record-deduplication-using-an-unique-field","status":"publish","type":"post","link":"https:\/\/merikebi.warrenmyers.com\/?p=26103","title":{"rendered":"Answer by warren for Splunk : Record deduplication using an unique field"},"content":{"rendered":"<p>What are you trying to achieve?<\/p>\n<p>If you&#8217;re sending &quot;unique&quot; events to the HEC, or you&#8217;re running UFs on &quot;unique&quot; logs, you&#8217;ll never get duplicate &quot;records when indexing&quot;.<\/p>\n<p>It <em>sounds<\/em> like you (perhaps routinely?) resend the same data to your aggregation platform &#8211; which is not a problem with the <em>aggregator<\/em>, but with your <em>sending<\/em> process.<\/p>\n<p>Almost like you&#8217;re doing a <a href=\"https:\/\/stackoverflow.com\/q\/1361340\/4418\">MySQL<\/a>\/<a href=\"https:\/\/stackoverflow.com\/a\/34639631\/4418\">PostgreSQL<\/a> &quot;insert if not exists&quot; operation. If that is a correct understanding of your situation, based on your statement<\/p>\n<blockquote>\n<p>We currently use &quot;document id&quot; in ElasticSearch to deduplicate records when indexing:<br \/>\n<a href=\"https:\/\/www.elastic.co\/guide\/en\/elasticsearch\/reference\/current\/docs-index_.html\" rel=\"nofollow noreferrer\">https:\/\/www.elastic.co\/guide\/en\/elasticsearch\/reference\/current\/docs-index_.html<\/a><br \/>\nWe generate the id using hash of the content of the each log-record.<\/p>\n<\/blockquote>\n<p>then you need to evaluate what is going &quot;wrong&quot; in your <em>sending<\/em> process that you feel you need to pre-clean the data before ingesting it.<\/p>\n<p>It is true that Splunk won&#8217;t &quot;deduplicate records when indexing&quot; &#8211; because it <em>presumes<\/em> the data coming-in to be &#8216;correct&#8217; from whatever is submitting it.<\/p>\n<p>How are you getting duplicate data in the first place?<\/p>\n<p>Fields in Splunk which begin with the underscore (eg <code>_time<\/code>, <code>_cd<\/code>, etc) are <em><strong>not<\/strong><\/em> editable\/sendable &#8211; they&#8217;re generated by Splunk when it receives data. IOW, they&#8217;re all <em>internal<\/em> fields. Searchable. Usable. But not overrideable.<\/p>\n<p>If you <em>really<\/em> have a problem with [lots of\/too much] duplicate data, <em><strong>and there is no way to fix your sending process[es]<\/strong><\/em>, then you&#8217;ll need to rely on deduplication <a href=\"https:\/\/docs.splunk.com\/Documentation\/Splunk\/latest\/Search\/Aboutthesearchlanguage\" rel=\"nofollow noreferrer\">operations<\/a> in <a href=\"https:\/\/docs.splunk.com\/Splexicon:SPL\" rel=\"nofollow noreferrer\">SPL<\/a> when <a href=\"https:\/\/docs.splunk.com\/Documentation\/Splunk\/latest\/SearchReference\/UnderstandingSPLsyntax\" rel=\"nofollow noreferrer\">searching<\/a> for\/reporting on whatever you&#8217;ve ingested (<em>primarily<\/em> by using <a href=\"https:\/\/docs.splunk.com\/Documentation\/Splunk\/latest\/SearchReference\/stats\" rel=\"nofollow noreferrer\"><code>stats<\/code><\/a> and, when absolutely necessary\/unavoidable, <a href=\"https:\/\/antipaucity.com\/2018\/03\/08\/more-thoughts-on-stats-vs-dedup-in-splunk\/#.X8eVINtOmc4\" rel=\"nofollow noreferrer\"><code>dedup<\/code><\/a>).<\/p>\n<p>from User warren &#8211; Stack Overflow https:\/\/stackoverflow.com\/questions\/65101933\/splunk-record-deduplication-using-an-unique-field\/65109031#65109031<br \/>\nvia <a href=\"https:\/\/ifttt.com\/?ref=da&#038;site=wordpress\">IFTTT<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>What are you trying to achieve? If you&#8217;re sending &quot;unique&quot; events to the HEC, or you&#8217;re running UFs on &quot;unique&quot; logs, you&#8217;ll never get duplicate &quot;records when indexing&quot;. It sounds like you (perhaps routinely?) resend the same data to your aggregation platform &#8211; which is not a problem with the aggregator, but with your sending &hellip;<br \/><a href=\"https:\/\/merikebi.warrenmyers.com\/?p=26103\" class=\"more-link pen_button pen_element_default pen_icon_arrow_double\">Continue reading <span class=\"screen-reader-text\">Answer by warren for Splunk : Record deduplication using an unique field<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[4],"tags":[991],"keyring_services":[],"class_list":["post-26103","post","type-post","status-publish","format-standard","hentry","category-blih","tag-stackexchange"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/merikebi.warrenmyers.com\/index.php?rest_route=\/wp\/v2\/posts\/26103","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/merikebi.warrenmyers.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/merikebi.warrenmyers.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/merikebi.warrenmyers.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/merikebi.warrenmyers.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=26103"}],"version-history":[{"count":1,"href":"https:\/\/merikebi.warrenmyers.com\/index.php?rest_route=\/wp\/v2\/posts\/26103\/revisions"}],"predecessor-version":[{"id":26104,"href":"https:\/\/merikebi.warrenmyers.com\/index.php?rest_route=\/wp\/v2\/posts\/26103\/revisions\/26104"}],"wp:attachment":[{"href":"https:\/\/merikebi.warrenmyers.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=26103"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/merikebi.warrenmyers.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=26103"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/merikebi.warrenmyers.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=26103"},{"taxonomy":"keyring_services","embeddable":true,"href":"https:\/\/merikebi.warrenmyers.com\/index.php?rest_route=%2Fwp%2Fv2%2Fkeyring_services&post=26103"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}