How do I get a Wyam pipeline of documents based of a comma-delimited meta value from a previous pipeline? -
i have wyam pipeline called "posts" filled documents. of these documents have tags
meta value, comma-delimited list of tags. example, let's has 3 documents, tags
meta of:
gumby,pokey gumby,oscar oscar,kermit
i want new pipeline filled one document each unique tag found in documents in "posts" pipeline. these documents should have tag in meta value called tagname
.
so, above values should result in new pipeline consisting of four documents, tagname
meta values of:
gumby pokey oscar kermit
here solution. this technically works, feel it's inefficient, , i'm pretty sure there has better way.
documents(c => c.documents["posts"] .select(d => d.string("tags", string.empty)) .selectmany(s => s.split(",".tochararray())) .select(s => s.trim().tolower()) .distinct() .select(s => c.getnewdocument( string.empty, new list<keyvaluepair<string, object>>() { new keyvaluepair<string, object>("tagname", s) } )) )
so, i'm calling documents
, passing in contextconfig
which:
- gets documents "posts" (i have collection of documents)
- selects
tags
meta value (now have collection of strings) - splits on comma (a bigger collection of strings)
- then trims , lower cases (still collection of strings)
- de-dupes (a smaller collection of strings)
- then creates new document each value in list, empty body ,
tagname
value string (i should end collection of new documents)
again, works. there better way?
that's not bad @ - part of challenge here getting comma-separated list of tags can processed linq expression or similar. part unavoidable , accounts 3 of lines in expression.
that said, wyam provide little here tolookup()
extension (see bottom of page: http://wyam.io/getting-started/concepts).
here's how might (this code self-contained linqpad script , need adjusted use in wyam config file):
public void main() { engine engine = new engine(); engine.pipelines.add("posts", new postsdocuments(), new meta("tagarray", (doc, ctx) => doc.string("tags") .tolowerinvariant().split(',').select(x => x.trim()).toarray()) ); engine.pipelines.add("tags", new documents(ctx => ctx.documents["posts"] .tolookup<string>("tagarray") .select(x => ctx.getnewdocument(new metadataitems { { "tagname", x.key } }))), new execute((doc, ctx) => { console.writeline(doc["tagname"]); return null; }) ); engine.execute(); } public class postsdocuments : imodule { public ienumerable<idocument> execute(ireadonlylist<idocument> inputs, iexecutioncontext context) { yield return context.getnewdocument(new metadataitems { { "tags", "gumby,pokey" } }); yield return context.getnewdocument(new metadataitems { { "tags", "gumby,oscar" } }); yield return context.getnewdocument(new metadataitems { { "tags", "oscar,kermit" } }); } }
output:
gumby pokey oscar kermit
a lot of housekeeping set fake environment testing. important part you're looking this:
engine.pipelines.add("tags", new documents(ctx => ctx.documents["posts"] .tolookup<string>("tagarray") .select(x => ctx.getnewdocument(new metadataitems { { "tagname", x.key } }))), // ... );
note still have work of getting comma delimited tags list array - it's happening earlier in "posts" pipeline.
Comments
Post a Comment